[AccessD] Intentional Denormalize?

Charlotte Foust charlotte.foust at gmail.com
Mon Jan 16 14:49:48 CST 2012


John,
You're effectively doing data warehousing and that is typically
denormalized through the use of a star schema.  If you're not familiar with
that, I can recommend Ralph Kimbell's books on data warehousing.  In
essence, you leave the data where it is and build dimension tables to
provide you the indexes you need to get the answers.  That involves placing
a bunch of specific indexes on the data table that join with the dimension
tables for things like dates (a table with full calendar years in it, that
includes breakout columns for day, month, quarter, year, etc), and for any
other dimensions you need to use, i.e., ethnic group or age range,   The
dimensions are determined by how you want to slice and dice the data.  You
build a logical data map through using dimensions, which are not normalized
by definition, since, for example, your dates table would have the same
Quarter number for 3 months worth of days.  The day number would reoccur on
different years, as would the month number, which would also repeat for the
number of days in that month.  It's harder to explain than to understand,
but I would strongly suggest it for what you're doing.

Charlotte Foust

On Mon, Jan 16, 2012 at 10:09 AM, jwcolby <jwcolby at colbyconsulting.com>wrote:

> As I have discussed in the past, I have a set of about 8 tables (which I
> call AZData) containing name/address info.  This table has just a PKID,
> FNAMe, LName, Addr, City, St, Zip5, Zip4.  I have to maintain these records
> by sending them out every month for CASS / NCOA processing.  I currently
> process about 350 million name/address records a month.
>
> The data always comes from bigger tables which contains demographics data,
> things like "owns a dog, owns a cat" ('Y'/'N' for each of these fields) or
> Child age 0-3, 4-7 etc (codes 1,2,or 3 indicating m/f/both).
>
> To this point I have always built a separate database to hold the original
> table with an autonumber PK, then I break the name/address out into the
> AZData table, linked back to the demographics table by an FK which is the
> PK in the demographics table.  I then process that AZData table monthly.
> DogsAndCats contains about 11 million records, Kids contains 21 million
> records and so forth.  I did it this way for a couple of reasons, an
> attempt to keep the physical size of the database down and to some extent
> simply history - that is the way I started doing this.
>
> Well, with 8 databases and 350 million records it is probably time to
> change.  I am considering building a table with the name/addr fields and
> then a set of fields with an FK pointing back to each database where that
> name / address can be found.  IOW the same name/addr may be in the kids and
> DogsAndCats table.  So there would be a single name/addr record with an FK
> back to those two demographics tables.
>
> I would have a column for each demographics database's PK.  This is
> denormalized, in the sense that as I add a new demographics table I have to
> add a new column to the name / address table to hold the PK linking records
> back to that demogr table.  I normally consider any case where I have to
> add a field like this to be "denormalized" - kinda like Child1, Child2.
>
> The upside is that suddenly the demographics table link through a single
> M-M table (which happens to hold the name/addr info as well), which allows
> a very simple sql statement to determine whether a person with dogs also
> has kids etc.
>
> Any thoughts on the wisdom of denormalizing in this manner?
>
> --
> John W. Colby
> Colby Consulting
>
> Reality is what refuses to go away
> when you do not believe in it
>
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> http://databaseadvisors.com/**mailman/listinfo/accessd<http://databaseadvisors.com/mailman/listinfo/accessd>
>
>
> Website: http://www.databaseadvisors.**com<http://www.databaseadvisors.com>
>
>
>


More information about the AccessD mailing list