[dba-SQLServer] Moving tables back and forth between databases

John W. Colby jwcolby at colbyconsulting.com
Tue Nov 9 00:51:25 CST 2004


I am not normalizing so much as breaking the table down into groups of
related fields, which I suppose is normalization but not in the classical
sense.  The objective is to:

1) Get rid of having to process 640 fields at once.
2) Enable indexing ALL fields so that counts and so forth can be done in
near-real time

What typically happens (AFAICT) is that the person wanting to buy data asks
for counts.  How many records in these zip codes, where the income is in
this range, age in this range and reads books (for example).  That is only
asking for where clauses in a handful of fields in perhaps 4 or 5
sub-tables.  How do you do this at ALL on a table of 640 fields / 65 million
records when SQL server only allows 240 indexes on a table?  You don't.

By breaking it down into address / then other related field tables, I end up
having to join 4 or 5 sub-tables back on the PK of course, but I can now
have every single field indexed.  It appears that there are about 30 sets of
related data.  Again, this is never a 1 to many, just a set of 1-1 where the
related data sets are pulled out into separate tables.  And again, it's not
like the client will ask questions about all 30 sub tables.

Without an index a table scan on any one field would take a lifetime - well,
actually about 24 hours.  A simple count of a single field grouped to show
all values and their count returns in about 30 seconds.  So yea, I have to
live with the join on the indexes putting 4 or 5 tables back together, but
the count (per field?) drops from 24 hours to 30 seconds.  I won't pretend
that I have actually tried this yet (counts of joined tables), but I will be
soon, and I'll report back with results.

Bottom line, there is simply no other way to handle this at least on the
full table.  

Furthermore, eventually I am looking to use the system to return actual data
sets on the same where clause, i.e. a half million actual address records
given a set of zip codes, income range, age range, reads books etc.  This is
more than simple data mining I think, it is asking COUNT questions on N
fields selected more or less at random.  ALWAYS a count.  Then hopefully
someday "ok give me 250k of those records" (actual address data).

The big question... Will the query processor perform the where on each sub
table, then join the results?  Or join the 65 million record sub tables,
then perform the wheres? Could I force the previous scenario somehow?  Pull
the N records (PK of) that fall in the given zip codes, then the M PKs of
the records that fall in the selected income bracket, then the O PKs that
fall in the selected age bracket, then the P PKs that... And then at the end
join the result sets on the PKs returned in each of the subs?  That should
be relatively fast you would think, assuming an index on each and every
field.

Remember too that this is a static db, i.e.no records are added on a daily
basis, so the large number of indexes don't kill you in adding / deleting /
modifying records since you don't do that.

John W. Colby
www.ColbyConsulting.com 

Contribute your unused CPU cycles to a good cause:
http://folding.stanford.edu/

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael
Maddison
Sent: Tuesday, November 09, 2004 12:54 AM
To: dba-sqlserver at databaseadvisors.com
Subject: RE: [dba-SQLServer] Moving tables back and forth between databases


John,

I'm not sure EM or DTS will do a transfer with indexes.  I'm pretty sure it
wont actually. I think you will have to script the create indexes manually.
What I'd do is create a script for each server, (QA will do most of the work
for you) create table go append data go alter table  (constraints) go create
index go

Then run it from the target server, that way it wont affect your main server
too much.

I hope your getting paid the big bucks because If you think you've had
troubles before I think the fun is just about to begin ;-)))

I read your other posts as well and I'm not sure if my comments will be all
that helpful as I suspect you'll need a fair amount of trial and error
anyway to see what works best for 
the environment you have to work with...  

However, I'm nervous about you normalising the big table.  I predict you
will only end up hampering performance.  Joins on large table or many joins
take a considerable amount of time. If you need to create separate files for
space reasons have a look at Partitioned Views in BOL. This will allow you
to split the table over multiple drives and multple servers.  

I'd also seriously try and coerce the client into taking an OLAP approach,
though this also has its drawbacks as there are so many damn columns.  A
cube with 600 dimensions would never
finish processing so maybe not :-)))   Just how dynamic do the queries
have to be?

cheers

Michael M




Can SQL Server EM copy tables from a database in one SQL Server (Neo2) to
another SQL Server (Neo1)?  I am trying to tear down the big 64 million
record 640 field table into sub-tables.  The process consists of creating
the tables, building an append query to copy the data from the big table to
the new sub-table, then creating indexes on every field in the sub-table.
The process of creating the new tables is relatively easy and quick,
manually copying the field defs into the paste buffer, then pasting them
back in to a new table.  The process of creating and executing the append
query is relatively painless and fairly quick though it does take a couple
of hours (I think, never actually timed it) to copy the data. However the
process of creating the indexes is time consuming both in the doing
(manually in table design view) and particularly in the execution (saving
the table with the new indexes - which can take overnight.  The process of
creating the indexes pretty much locks up the machine (80-100% CPU
utilization) for many many hours.

I would like to create the sub-tables all at once, then farm out the index
creation to my other machines, but in order to do that I have to copy the
data filled tables to another machine, switch to that machine, create the
indexes in design view, then save the table in design view which starts the
index creation process happening.  

I have no idea whether it is even possible to do this "table copy", but all
indexes need to copy as well at least when I copy them back to the main SQL
Server.

On another note, how do I split the database into more files?  I built it
originally in 4 files, and then as I finished the import I shrank them down
into two files (on two physical 250g disks).  I am now pushing the size up
such that I need to split them back out to at least one more 250g disk (one
more file), and perhaps even two.  I haven't discovered how to split the
database into more physical files.

John W. Colby
www.ColbyConsulting.com 

Contribute your unused CPU cycles to a good cause:
http://folding.stanford.edu/


_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com







More information about the dba-SQLServer mailing list