John W. Colby
jwcolby at colbyconsulting.com
Tue Nov 9 00:51:25 CST 2004
I am not normalizing so much as breaking the table down into groups of related fields, which I suppose is normalization but not in the classical sense. The objective is to: 1) Get rid of having to process 640 fields at once. 2) Enable indexing ALL fields so that counts and so forth can be done in near-real time What typically happens (AFAICT) is that the person wanting to buy data asks for counts. How many records in these zip codes, where the income is in this range, age in this range and reads books (for example). That is only asking for where clauses in a handful of fields in perhaps 4 or 5 sub-tables. How do you do this at ALL on a table of 640 fields / 65 million records when SQL server only allows 240 indexes on a table? You don't. By breaking it down into address / then other related field tables, I end up having to join 4 or 5 sub-tables back on the PK of course, but I can now have every single field indexed. It appears that there are about 30 sets of related data. Again, this is never a 1 to many, just a set of 1-1 where the related data sets are pulled out into separate tables. And again, it's not like the client will ask questions about all 30 sub tables. Without an index a table scan on any one field would take a lifetime - well, actually about 24 hours. A simple count of a single field grouped to show all values and their count returns in about 30 seconds. So yea, I have to live with the join on the indexes putting 4 or 5 tables back together, but the count (per field?) drops from 24 hours to 30 seconds. I won't pretend that I have actually tried this yet (counts of joined tables), but I will be soon, and I'll report back with results. Bottom line, there is simply no other way to handle this at least on the full table. Furthermore, eventually I am looking to use the system to return actual data sets on the same where clause, i.e. a half million actual address records given a set of zip codes, income range, age range, reads books etc. This is more than simple data mining I think, it is asking COUNT questions on N fields selected more or less at random. ALWAYS a count. Then hopefully someday "ok give me 250k of those records" (actual address data). The big question... Will the query processor perform the where on each sub table, then join the results? Or join the 65 million record sub tables, then perform the wheres? Could I force the previous scenario somehow? Pull the N records (PK of) that fall in the given zip codes, then the M PKs of the records that fall in the selected income bracket, then the O PKs that fall in the selected age bracket, then the P PKs that... And then at the end join the result sets on the PKs returned in each of the subs? That should be relatively fast you would think, assuming an index on each and every field. Remember too that this is a static db, i.e.no records are added on a daily basis, so the large number of indexes don't kill you in adding / deleting / modifying records since you don't do that. John W. Colby www.ColbyConsulting.com Contribute your unused CPU cycles to a good cause: http://folding.stanford.edu/ -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael Maddison Sent: Tuesday, November 09, 2004 12:54 AM To: dba-sqlserver at databaseadvisors.com Subject: RE: [dba-SQLServer] Moving tables back and forth between databases John, I'm not sure EM or DTS will do a transfer with indexes. I'm pretty sure it wont actually. I think you will have to script the create indexes manually. What I'd do is create a script for each server, (QA will do most of the work for you) create table go append data go alter table (constraints) go create index go Then run it from the target server, that way it wont affect your main server too much. I hope your getting paid the big bucks because If you think you've had troubles before I think the fun is just about to begin ;-))) I read your other posts as well and I'm not sure if my comments will be all that helpful as I suspect you'll need a fair amount of trial and error anyway to see what works best for the environment you have to work with... However, I'm nervous about you normalising the big table. I predict you will only end up hampering performance. Joins on large table or many joins take a considerable amount of time. If you need to create separate files for space reasons have a look at Partitioned Views in BOL. This will allow you to split the table over multiple drives and multple servers. I'd also seriously try and coerce the client into taking an OLAP approach, though this also has its drawbacks as there are so many damn columns. A cube with 600 dimensions would never finish processing so maybe not :-))) Just how dynamic do the queries have to be? cheers Michael M Can SQL Server EM copy tables from a database in one SQL Server (Neo2) to another SQL Server (Neo1)? I am trying to tear down the big 64 million record 640 field table into sub-tables. The process consists of creating the tables, building an append query to copy the data from the big table to the new sub-table, then creating indexes on every field in the sub-table. The process of creating the new tables is relatively easy and quick, manually copying the field defs into the paste buffer, then pasting them back in to a new table. The process of creating and executing the append query is relatively painless and fairly quick though it does take a couple of hours (I think, never actually timed it) to copy the data. However the process of creating the indexes is time consuming both in the doing (manually in table design view) and particularly in the execution (saving the table with the new indexes - which can take overnight. The process of creating the indexes pretty much locks up the machine (80-100% CPU utilization) for many many hours. I would like to create the sub-tables all at once, then farm out the index creation to my other machines, but in order to do that I have to copy the data filled tables to another machine, switch to that machine, create the indexes in design view, then save the table in design view which starts the index creation process happening. I have no idea whether it is even possible to do this "table copy", but all indexes need to copy as well at least when I copy them back to the main SQL Server. On another note, how do I split the database into more files? I built it originally in 4 files, and then as I finished the import I shrank them down into two files (on two physical 250g disks). I am now pushing the size up such that I need to split them back out to at least one more 250g disk (one more file), and perhaps even two. I haven't discovered how to split the database into more physical files. John W. Colby www.ColbyConsulting.com Contribute your unused CPU cycles to a good cause: http://folding.stanford.edu/ _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com