jwcolby
jwcolby at colbyconsulting.com
Mon Jan 16 12:09:08 CST 2012
As I have discussed in the past, I have a set of about 8 tables (which I call AZData) containing
name/address info. This table has just a PKID, FNAMe, LName, Addr, City, St, Zip5, Zip4. I have to
maintain these records by sending them out every month for CASS / NCOA processing. I currently
process about 350 million name/address records a month.
The data always comes from bigger tables which contains demographics data, things like "owns a dog,
owns a cat" ('Y'/'N' for each of these fields) or Child age 0-3, 4-7 etc (codes 1,2,or 3 indicating
m/f/both).
To this point I have always built a separate database to hold the original table with an autonumber
PK, then I break the name/address out into the AZData table, linked back to the demographics table
by an FK which is the PK in the demographics table. I then process that AZData table monthly.
DogsAndCats contains about 11 million records, Kids contains 21 million records and so forth. I did
it this way for a couple of reasons, an attempt to keep the physical size of the database down and
to some extent simply history - that is the way I started doing this.
Well, with 8 databases and 350 million records it is probably time to change. I am considering
building a table with the name/addr fields and then a set of fields with an FK pointing back to each
database where that name / address can be found. IOW the same name/addr may be in the kids and
DogsAndCats table. So there would be a single name/addr record with an FK back to those two
demographics tables.
I would have a column for each demographics database's PK. This is denormalized, in the sense that
as I add a new demographics table I have to add a new column to the name / address table to hold the
PK linking records back to that demogr table. I normally consider any case where I have to add a
field like this to be "denormalized" - kinda like Child1, Child2.
The upside is that suddenly the demographics table link through a single M-M table (which happens to
hold the name/addr info as well), which allows a very simple sql statement to determine whether a
person with dogs also has kids etc.
Any thoughts on the wisdom of denormalizing in this manner?
--
John W. Colby
Colby Consulting
Reality is what refuses to go away
when you do not believe in it