Jim Lawrence (AccessD)
accessd at shaw.ca
Thu Sep 2 16:22:39 CDT 2004
Hi John: I have sent along your problem(s) to a friend who has some extensive work in various SQL DBS to see if he had any comments. He found the whole scenario very intriguing and then sent this email: <email> With the OLAP layout, are we looking at a central table with basic person info (pivot table) surrounded by a bunch of "fact tables", as follows? /* the "pivot" table */ person person_id person_name (maybe a few more fundamental attributes) /* the "fact" tables (possibly hundreds of these) */ address person_id street city zip_code etc. education person_id highest_level_reached specialty etc. etc. (ad nauseum) Or, is our friend planning to put it all in one table with 600 columns? Either way, I would be interested in trying the following, which is close to what I was doing in Calgary. I am not quite sure how well the server will handle it, but it does offer some advantages: /* pivot table */ person person_id person_name info_tags info_tag /* primary key, e.g., "address", "phone number", "edu_highest_level" etc. */ info_description /* a long description for the tag */ is_required /* whether this is a required or optional tag */ info_class /* text, number, money etc (for use with validation and display stuff) */ etc. /* fact table */ person_info person_id /* foreign key -> person */ info_tag /* foreign key -> info_tags */ info /* the actual info, stored as a general type (I forget what this is called in SQL server) */ Notice a few things about this design: - There is only one fact table. This is both a blessing and a curse. The blessing: - It can be sparsely populated; only the attributes that apply to a given person need be filled in. No nulls to deal with. - New tags can be added on the fly, because they are simply metadata. - very simple to understand the schema, and to query for any combination of tags your heart desires. - a max of three indexes required on the fact table; one of these can be a clustered index. The curse: - the table is humungous: potentially, 60 million x 600 = 36 billion records. This will make for some pretty slow index builds, and painfully slow table scans. The key to success will be queries that avoid table scans. BTW, the potential benefits of using Cache to do this would be many: the resulting data compression and much easier index management, along with a big increase in speed. Very easy distribution across multiple physical databases and processors. As for the cost, it depends on the number of access seats that are required. I was surprised at how reasonably priced these guys are. In many situations, they are way cheaper than SQL server. When we spoke to them, we were going to get licences at $US200 per seat. There enterprise licence was $US500 / seat. This included full networking of multiple databases - much easier to do than with SQL server. At the end of the day, especially if you consider the hardware and screwing-around costs, much cheaper than SQL server... </email> Hope this provides some useful information. Jim