[AccessD] Mucking around

Fri Sep 28 14:01:50 CDT 2007

Max,

Data as most of us understand it is "highly normalized", with that term
meaning more to some than others!  ;-)  We tend to think of data entry
systems and "normal" reporting systems.  Somehow we have to get the data
entered, and in order to do that and minimize errors we build many tables,
each table holding data about only a single "subject", people, companies,
checks, banks, etc.  That works well for data entry, there is one and only
one "bank of america" in our bank table, and when we mean "bank of america"
we go into that table and pull info for it, or pull it's PK to place in some
other record than needs to record (for example) accounts for bank of
america.

Again, all that works well for its intended purpose - which is COLLECTING
information about objects.  REPORTING data about such objects can get real
slow real quick.  Bank of america has millions of "transactions" a day,
trying to report on all of the transactions for a given region by week can
cause the normalized database to go nuts.  It involves sorting through
potentially billions of transactions, with multiple joins between various
tables, and multiple where clauses on various fields.  Running a report can
take hours or days even on mainframes.

So in order to make such reporting faster such data may be intentionally
denormalized, the how and why of the denormalization being an entirely
different study from our "normal" normalization stuff.  But the concept is
simply that once the data exists and is verified and WILL NEVER (or rarely)
CHANGE, then the denormalization can be done without fear.  The data is
rearranged into tables different from our normalized tables.  Perhaps (and
this is ONLY an example) all of the data for each region is pulled out into
a separate table with ALL of the fields for all the banks, accounts,
customers and transactions (in that region) placed into a single table.  BAD
practice for transactional data (adding / deleting / modifiying data) but
GOOD practice for reporting quickly on a specific region.

Now before anyone jumps on me for this example or what you might or might
not actually do in the process of denormalization, I do NOT understand the
actual mechanics of such data storage / denormalization, or the how and the
why, only THAT such things exist and why.

The point is simply that because the joins between branch / account / people
/ transactions were done ONE TIME, the WHERE CLAUSE to select data for a
specific branch was done ONE time, and even though it took days or weeks to
get the data and write it out to a table, it only has to be done once
because the data behind the denormalized table is static.  The branch will
never change, the account will never change, the transactions will never
change so a "snap shot" can be taken and stored permanently.  Then that
snapshot is used to report against.

In fact even this snap shot may be too large, they might very well do
summations such as "the sums and counts of all deposits made" for each day
or each month etc and store that info.  Since the deposits were made and
can't be unmade, the data is valid forever.  Do it once, then just report
off of the snapshots.

Those "snapshots" are the concept of if not the actual realization of a data
mart.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of
max.wanadoo at gmail.com
Sent: Friday, September 28, 2007 2:22 PM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] Mucking around

Thanks John,
Normalised data I understand.
What I don't understand is how we get from that to Data Marts.  

Q1: What do I  do with my normalised tables.  If the answer is to leave the
data in the normalised table and then re-post it to a flat table, then why
could that not have been done at data entry.
Q2: If the answer to Q1 is go straight to flat tables, then what do I do
AFTER that.

When it then comes to pulling the data out into reports are we talking about
using software other than Access?

The terminology is throwing me a bit too (well, to be honest, it is throwing
me a lot).
With the help of you guys, I will undertand it eventually.

What is conceptually throwing me at the moment though is this:  If the
reason people use datamarts (Star/Snow) to quickly create reports which
dice/slice down through the data, then are we or are we not just moving the
"time Taken" from the report stage to the data input stage (which would make
sense to me).  But if I am completely wrong here, then I really am "all at
sea!"

Thanks
Max