[dba-SQLServer] Any gut feelings?

Fri Nov 3 00:38:31 CST 2006

Hi John:

The only thing that I could suggest seeing that you appear to be on the
right path is to leave out actually building the indexes until last. Making
them yes but not building them as there will be a speed component in all
this. 

If I was building a huge database system I would be building a small test
schema to see it all works first before applying the final design. If there
is a mistake or two in the schema it will not be easy to change things when
there are 30 plus tables and appropriate joins and 65 million records with
indexes to process.

Just a thought
Jim   

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of JWColby
Sent: Thursday, November 02, 2006 9:58 PM
To: dba-sqlserver at databaseadvisors.com
Subject: [dba-SQLServer] Any gut feelings?

I have this huge table, 65 million records, ~700 fields.  The table has a
bunch of now useless fields which I will be deleting, basically old cass and
ncoa data from a previous (many years ago) address verification on the
table.  More importantly, it also has what I will call demographics data,
but extended into such things as brand preference for consumables, hobbies,
types of electronics purchases, etc.

Obviously the sheer size is an issue here.  What I would think to do, coming
from a small database background, is to create a PK on the big table, then
break this big table down into smaller tables, each focused on one "set" of
demographics data.

My question to you guys is, given the size of the table in terms of sheer
number of records, is this a valid strategy?  If I have an indexed and
identical PK in each table, can I break out the Address into a table,
age/income/race/religion into a table, then tables for boating, electronics,
medicines, etc. with each sub table only containing records where the
demographics info is populated.  IOW, not everyone has boating info, not
everyone has medical info, electronics info etc. so I end up with a smaller
subset of records in each table where there is only a record in that table
if the person has data for that demographic set.

Having done that can I then expect to be able to join up to a half dozen
tables to get inner join subsets, outer join mega-sets etc. in anything
approaching near real time (minutes)?

I am in the process of doing the very first (two) breakouts, address and
boating.  Obviously I have to create a new table, append the data in, then
build an index on the PK for starters.  Having done that I can do a quick
and dirty test, but I would like your opinions on the general feasibility of
this approach given the limitations of current (but state of the art)
desktop hardware, SBS 2003 and SQL Server 2000.

Having answered this (assuming that the answer is "it is probably a valid
approach", I then need assistance on various methods to determine the
existence of data within a given record of valid demographic data.  IOW, I
am pulling all 65 million records, a subset of fields (typically 10 to 30
fields) into a smaller table.  This smaller table will eventually have an
indexed PK, but NO indexes on the various fields.  One way I can think of to
do something like this is to generate a "validity" field where I store a
number which is simply a count of the fields with something other than a
zero length string.  So record 1 has 0 valid data fields, rec 2 has 3 valid
data fields, 3 has 12 valid data fields and so forth.  IF I could generate
such a query to get these counts and store it in a single field (and ever
get results) and then apply an index on that field, anything over a zero has
valid data and could be used to pull records, or (in reverse) records with a
zero in the "validity" field could be deleted from the table to drop the
number of records in that table.

Other thoughts on how to accomplish this objective?  Remember that this main
table is simply freakin huge, with probably 20-50 different demographic
groupings and some where the boundaries aren't even clear.  In order to do
this I will have to create that same number of tables, cut out the data and
get it into these new tables, index the PK, generate counts of the validity
data, delete records with no valid data etc.  Some kind of automation for
doing this would be nice, but basically, I can if necessary just brute force
it once I have the procedure down.

Ideas?  Objections?  "Your crazy to even think about this"?

John W. Colby
Colby Consulting
www.ColbyConsulting.com

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com