[AccessD] The Very very big DB

Thu Sep 2 16:22:39 CDT 2004

Hi John:

I have sent along your problem(s) to a friend who has some extensive work in
various SQL DBS to see if he had any comments. He found the whole scenario
very intriguing and then sent this email:

<email>
With the OLAP layout, are we looking at a central table with basic person
info (pivot table) surrounded by a bunch of "fact tables", as follows?

         /* the "pivot" table */
         person
                 person_id
                 person_name
                 (maybe a few more fundamental attributes)

         /* the "fact" tables (possibly hundreds of these) */
         address
                 person_id
                 street
                 city
                 zip_code
                 etc.

         education
                 person_id
                 highest_level_reached
                 specialty
                 etc.

         etc. (ad nauseum)

Or, is our friend planning to put it all in one table with 600 columns?

Either way, I would be interested in trying the following, which is close
to what I was doing in Calgary.  I am not quite sure how well the server
will handle it, but it does offer some advantages:

         /* pivot table */
         person
                 person_id
                 person_name

         info_tags
                 info_tag        /* primary key, e.g., "address", "phone
number", "edu_highest_level" etc. */
                 info_description /* a long description for the tag */
                 is_required     /* whether this is a required or optional
tag */
                 info_class      /* text, number, money etc (for use with
validation and display stuff) */
                 etc.

         /* fact table */
         person_info
                 person_id       /* foreign key -> person */
                 info_tag        /* foreign key -> info_tags */
                 info            /* the actual info, stored as a general
type (I forget what this is called in SQL server) */

Notice a few things about this design:
         - There is only one fact table.  This is both a blessing and a
curse.
           The blessing:
                 - It can be sparsely populated; only the attributes that
apply to a given person need be filled in.  No nulls to deal with.
                 - New tags can be added on the fly, because they are simply
metadata.
                 - very simple to understand the schema, and to query for
any combination of tags your heart desires.
                 - a max of three indexes required on the fact table; one of
these can be a clustered index.

           The curse:
                 - the table is humungous: potentially, 60 million x 600 =
36 billion records.
                   This will make for some pretty slow index builds, and
painfully slow table
                   scans.  The key to success will be queries that avoid
table scans.

BTW, the potential benefits of using Cache to do this would be many:  the
resulting data compression and much easier index management, along with a
big increase in speed.  Very easy distribution across multiple physical
databases and processors.  As for the cost, it depends on the number of
access seats that are required.  I was surprised at how reasonably priced
these guys are.  In many situations, they are way cheaper than SQL server.
When we spoke to them, we were going to get licences at $US200 per seat.
There enterprise licence was $US500 / seat.  This included full networking
of multiple databases - much easier to do than with SQL server.  At the end
of the day, especially if you consider the hardware and screwing-around
costs, much cheaper than SQL server...
</email>

Hope this provides some useful information.
Jim