max.wanadoo at gmail.com
max.wanadoo at gmail.com
Fri Sep 28 14:11:24 CDT 2007
Thanks John (and all those who posted). I will go through all of the posts over the weekend. Lots to get my head around. Have a good weekend, all of you great guys. Max -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Friday, September 28, 2007 8:02 PM To: 'Access Developers discussion and problem solving' Subject: Re: [AccessD] Mucking around Max, Data as most of us understand it is "highly normalized", with that term meaning more to some than others! ;-) We tend to think of data entry systems and "normal" reporting systems. Somehow we have to get the data entered, and in order to do that and minimize errors we build many tables, each table holding data about only a single "subject", people, companies, checks, banks, etc. That works well for data entry, there is one and only one "bank of america" in our bank table, and when we mean "bank of america" we go into that table and pull info for it, or pull it's PK to place in some other record than needs to record (for example) accounts for bank of america. Again, all that works well for its intended purpose - which is COLLECTING information about objects. REPORTING data about such objects can get real slow real quick. Bank of america has millions of "transactions" a day, trying to report on all of the transactions for a given region by week can cause the normalized database to go nuts. It involves sorting through potentially billions of transactions, with multiple joins between various tables, and multiple where clauses on various fields. Running a report can take hours or days even on mainframes. So in order to make such reporting faster such data may be intentionally denormalized, the how and why of the denormalization being an entirely different study from our "normal" normalization stuff. But the concept is simply that once the data exists and is verified and WILL NEVER (or rarely) CHANGE, then the denormalization can be done without fear. The data is rearranged into tables different from our normalized tables. Perhaps (and this is ONLY an example) all of the data for each region is pulled out into a separate table with ALL of the fields for all the banks, accounts, customers and transactions (in that region) placed into a single table. BAD practice for transactional data (adding / deleting / modifiying data) but GOOD practice for reporting quickly on a specific region. Now before anyone jumps on me for this example or what you might or might not actually do in the process of denormalization, I do NOT understand the actual mechanics of such data storage / denormalization, or the how and the why, only THAT such things exist and why. The point is simply that because the joins between branch / account / people / transactions were done ONE TIME, the WHERE CLAUSE to select data for a specific branch was done ONE time, and even though it took days or weeks to get the data and write it out to a table, it only has to be done once because the data behind the denormalized table is static. The branch will never change, the account will never change, the transactions will never change so a "snap shot" can be taken and stored permanently. Then that snapshot is used to report against. In fact even this snap shot may be too large, they might very well do summations such as "the sums and counts of all deposits made" for each day or each month etc and store that info. Since the deposits were made and can't be unmade, the data is valid forever. Do it once, then just report off of the snapshots. Those "snapshots" are the concept of if not the actual realization of a data mart. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of max.wanadoo at gmail.com Sent: Friday, September 28, 2007 2:22 PM To: 'Access Developers discussion and problem solving' Subject: Re: [AccessD] Mucking around Thanks John, Normalised data I understand. What I don't understand is how we get from that to Data Marts. Q1: What do I do with my normalised tables. If the answer is to leave the data in the normalised table and then re-post it to a flat table, then why could that not have been done at data entry. Q2: If the answer to Q1 is go straight to flat tables, then what do I do AFTER that. When it then comes to pulling the data out into reports are we talking about using software other than Access? The terminology is throwing me a bit too (well, to be honest, it is throwing me a lot). With the help of you guys, I will undertand it eventually. What is conceptually throwing me at the moment though is this: If the reason people use datamarts (Star/Snow) to quickly create reports which dice/slice down through the data, then are we or are we not just moving the "time Taken" from the report stage to the data input stage (which would make sense to me). But if I am completely wrong here, then I really am "all at sea!" Thanks Max -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com