[AccessD] Mucking around

Fri Sep 28 16:02:05 CDT 2007

If you want a good book on Data Warehousing concepts, Ralph Kimball has
"The Data Warehous Toolkit", which does a pretty good job of explaining
the concepts.

Charlotte Foust

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of
max.wanadoo at gmail.com
Sent: Friday, September 28, 2007 12:11 PM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] Mucking around

Thanks John (and all those who posted).
I will go through all of the posts over the weekend.  Lots to get my
head around.
Have a good weekend, all of you great guys.
Max

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Friday, September 28, 2007 8:02 PM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] Mucking around

Max,

Data as most of us understand it is "highly normalized", with that term
meaning more to some than others!  ;-)  We tend to think of data entry
systems and "normal" reporting systems.  Somehow we have to get the data
entered, and in order to do that and minimize errors we build many
tables, each table holding data about only a single "subject", people,
companies, checks, banks, etc.  That works well for data entry, there is
one and only one "bank of america" in our bank table, and when we mean
"bank of america"
we go into that table and pull info for it, or pull it's PK to place in
some other record than needs to record (for example) accounts for bank
of america.

Again, all that works well for its intended purpose - which is
COLLECTING information about objects.  REPORTING data about such objects
can get real slow real quick.  Bank of america has millions of
"transactions" a day, trying to report on all of the transactions for a
given region by week can cause the normalized database to go nuts.  It
involves sorting through potentially billions of transactions, with
multiple joins between various tables, and multiple where clauses on
various fields.  Running a report can take hours or days even on
mainframes.

So in order to make such reporting faster such data may be intentionally
denormalized, the how and why of the denormalization being an entirely
different study from our "normal" normalization stuff.  But the concept
is simply that once the data exists and is verified and WILL NEVER (or
rarely) CHANGE, then the denormalization can be done without fear.  The
data is rearranged into tables different from our normalized tables.
Perhaps (and this is ONLY an example) all of the data for each region is
pulled out into a separate table with ALL of the fields for all the
banks, accounts, customers and transactions (in that region) placed into
a single table.  BAD practice for transactional data (adding / deleting
/ modifiying data) but GOOD practice for reporting quickly on a specific
region.

Now before anyone jumps on me for this example or what you might or
might not actually do in the process of denormalization, I do NOT
understand the actual mechanics of such data storage / denormalization,
or the how and the why, only THAT such things exist and why.

The point is simply that because the joins between branch / account /
people / transactions were done ONE TIME, the WHERE CLAUSE to select
data for a specific branch was done ONE time, and even though it took
days or weeks to get the data and write it out to a table, it only has
to be done once because the data behind the denormalized table is
static.  The branch will never change, the account will never change,
the transactions will never change so a "snap shot" can be taken and
stored permanently.  Then that snapshot is used to report against.

In fact even this snap shot may be too large, they might very well do
summations such as "the sums and counts of all deposits made" for each
day or each month etc and store that info.  Since the deposits were made
and can't be unmade, the data is valid forever.  Do it once, then just
report off of the snapshots.

Those "snapshots" are the concept of if not the actual realization of a
data mart.

John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of
max.wanadoo at gmail.com
Sent: Friday, September 28, 2007 2:22 PM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] Mucking around

Thanks John,
Normalised data I understand.
What I don't understand is how we get from that to Data Marts.  

Q1: What do I  do with my normalised tables.  If the answer is to leave
the data in the normalised table and then re-post it to a flat table,
then why could that not have been done at data entry.
Q2: If the answer to Q1 is go straight to flat tables, then what do I do
AFTER that.

When it then comes to pulling the data out into reports are we talking
about using software other than Access?

The terminology is throwing me a bit too (well, to be honest, it is
throwing me a lot).
With the help of you guys, I will undertand it eventually.

What is conceptually throwing me at the moment though is this:  If the
reason people use datamarts (Star/Snow) to quickly create reports which
dice/slice down through the data, then are we or are we not just moving
the "time Taken" from the report stage to the data input stage (which
would make sense to me).  But if I am completely wrong here, then I
really am "all at sea!"

Thanks
Max

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com