jwcolby
jwcolby at colbyconsulting.com
Wed Jun 13 18:31:18 CDT 2007
Michael, Where the rubber meets the road however... Is the process of normalizing tables with hundreds of fields into a single "fact field" table. Think of what is required to do that. 1) I have to iterate through all of the fields to get the name of each field. 2) Create a record in a "fact name" table. 3) Get the newly created PKID of that fact name and use it to build a query that pulls all of the data from the table for exactly and only that field, writing the fact, the PKID of the fact name, and a PKID of a person/address into a three field fact table. Each time I find a new person / address I have to write a new record to the person / address table (actually three tables - person/address/mmPersonAddress, but who's counting). Somehow link the "factID" with the person/address id. Remember that each fact "belongs to" a person. So I either build the fact table to point directly to the PKID in the demographics table (the easy way) or normalize out to a person table and point the fact to a PersonID. Kind of hard to explain without diagrams. In any event, it would be easiest to run a query that pulls ALL of the data in a given field WHERE NOT NULL, IOW all of the data from ONE field where that field contains data. Then write just those records to the fact table with a FactNameID and the corresponding DemographicsID. The DemographicsID however implies a "tblDemographicsTableName". IOW each "list" as they are called is placed in a separate table and therefore has it's own PKID which starts at zero and increments. So the PKID of the fact table HSID will overlap the PKIDs from tblInfutor. As you can see, this is a mess and demands automation to get it right. I will, once I get up to speed on VB.Net and SQL Server, write an application to normalize fact table XXX. In the meantime, this is NOT something that can be done manually. It will involve at LEAST one query for each field of each fact table (700 fields in HSID alone). This is very doable, it is just something that requires fluency in a programming language up to the task of handling tables / fields out in SQL Server and dunamically creating and executing all of these fact normalization queries. It will be fun once I get around to it. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael Maddison Sent: Wednesday, June 13, 2007 7:08 PM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] SQL Server 2005 - long count time OLAP may be an option but I think John will still run into issues with 100's of dimensions. Actually with the changes Arthur recommended then it should work a treat... in theory... cheers Michael M John, How about using olap? This would do a lot of preprocessing to make it quick and easy to find counts and sums. My understanding is that SQL Server 2005 has very good olap options. (My experience is limited to Cognos, but a 300 million record cube of web site hit data took 20 minutes to process 12-20 dimensions and about 6 measures. The resulting cube gave instant answers.) If you did that, MDX (similar to SQL) would allow you to write queries against the olap cube. Because of the pre-processing of the olap cube, MDX query response should be instant. I originally got into this sort of stuff when the cross tabs I was running with Access as a front end to SQL Server 2000 were taking too long. The company I was working for wanted to use Cognos (2000's olap was fairly rudimentary before the development of MDX) so that's the direction I took, but I've heard lots of good things about 2005's olap. Fred