[dba-SQLServer] SQL Server 2005 - long count time

Wed Jun 13 18:31:18 CDT 2007

Michael,

Where the rubber meets the road however... Is the process of normalizing
tables with hundreds of fields into a single "fact field" table.  Think of
what is required to do that.  

1) I have to iterate through all of the fields to get the name of each
field.
2) Create a record in a "fact name" table.
3) Get the newly created PKID of that fact name and use it to build a query
that pulls all of the data from the table for exactly and only that field,
writing the fact, the PKID of the fact name, and a PKID of a person/address
into a three field fact table.  Each time I find a new person / address I
have to write a new record to the person / address table (actually three
tables - person/address/mmPersonAddress, but who's counting).  Somehow link
the "factID" with the person/address id.

Remember that each fact "belongs to" a person.  So I either build the fact
table to point directly to the PKID in the demographics table (the easy way)
or normalize out to a person table and point the fact to a PersonID.

Kind of hard to explain without diagrams.

In any event, it would be easiest to run a query that pulls ALL of the data
in a given field WHERE NOT NULL, IOW all of the data from ONE field where
that field contains data.  Then write just those records to the fact table
with a FactNameID and the corresponding DemographicsID.  The DemographicsID
however implies a "tblDemographicsTableName".  IOW each "list" as they are
called is placed in a separate table and therefore has it's own PKID which
starts at zero and increments.  So the PKID of the fact table HSID will
overlap the PKIDs from tblInfutor.

As you can see, this is a mess and demands automation to get it right.  I
will, once I get up to speed on VB.Net and SQL Server, write an application
to normalize fact table XXX.  In the meantime, this is NOT something that
can be done manually.  It will involve at LEAST one query for each field of
each fact table (700 fields in HSID alone).

This is very doable, it is just something that requires fluency in a
programming language up to the task of handling tables / fields out in SQL
Server and dunamically creating and executing all of these fact
normalization queries.  It will be fun once I get around to it.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael
Maddison
Sent: Wednesday, June 13, 2007 7:08 PM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] SQL Server 2005 - long count time

OLAP may be an option but I think John will still run into issues with 100's
of dimensions.
Actually with the changes Arthur recommended then it should work a treat...
in theory...

cheers

Michael M

John,

How about using olap? This would do a lot of preprocessing to make it quick
and easy to find counts and sums. My understanding is that SQL Server 2005
has very good olap options. (My experience is limited to Cognos, but a 300
million record cube of web site hit data took 20 minutes to process 12-20
dimensions and about 6 measures. The resulting cube gave instant answers.)

If you did that, MDX (similar to SQL) would allow you to write queries
against the olap cube. Because of the pre-processing of the olap cube, MDX
query response should be instant. 

I originally got into this sort of stuff when the cross tabs I was running
with Access as a front end to SQL Server 2000 were taking too long. The
company I was working for wanted to use Cognos (2000's olap was fairly
rudimentary before the development of MDX) so that's the direction I took,
but I've heard lots of good things about 2005's olap.

Fred