Arthur Fuller
fuller.artful at gmail.com
Wed Jul 25 12:12:15 CDT 2007
Apparently I have a much different view on this subject than most here. My view has evolved over years in this business, and changed due to increasing complexity of databases. 1. I used to think that ANPKs were the only way to go. Given a dozen or even two dozen tables, it made sense. Given 500 tables plus the low cost of disk space, I no longer agree with that model. As a consequence, I no longer agree that compound PKs are bad that ANPKs are always superior. 2. Any Front End worth its salt can easily pass multiple values to the BE to create a row in some related table. This is not difficult, and if the related table is something so simple as State/Province, containing about 62 rows each of which is uniquely identified by its abbreviation (i.e. SK = Saskatchewan, WY = Wyoming, FL = Florida), then what is gained by storing these as FKs derived from the States/Provinces table? Why not reduce said table to two columns, one being char(2) and the other being varchar(20)? What is gained by this strategy is that a table does not need to be joined in order to obtain the value that interests humans. 3. The aforementioned difficulty cascades to all related tables in the tree. Eventually you find yourself creating surrogate indexes on surrogate indexes in some related table, and it can cascade from there. With 5 or 6 tables, this may not seem an issue, but try 40 tables in a tree and you will get my point. 4. I used to be in the school of ANPKs everywhere but I have since departed, primarily due to the effects created in rich databases (e.g. large databases are a small number of tables with millions of rows; rich databases are a large number of tables with relatively few rows each; it's possible but seldom occurs that a db is both large and rich). In a rich database, I do not have the time to do 20 joins to grab the keys of the related data. I want it NOW, not one minute from now. Correctness of results and speed of retrieval are the most important tests. Disk space, long ago, was the paramount consideration, and caused many of us (including me) to think that ANPKs were the solution. Thanks to a couple of books I read, I have since then shed that illusion. That doesn't mean that I don't like ANPKs. I still love and treasure them. But I have learned that they don't belong in any table where the number of rows is fewer than 500 or so. It's silly. Disk space is cheap. Take the case of a table called State/Provinces, consisting of about 62 rows representing the states in USA and the provinces and territories in Canada. All of these items can be represented uniquely by a two-letter combination. Why on earth would you then introduce an ANPK into this table? It makes NO sense, IMO. It rather exemplifies the ridiculous application of a maxim learned long ago and adhered to since, without further examination. Let's go further. Suppose your firm has 100 products all of which are uniquely identified by three alpha characters. Why would you bother creating an ANPK on said table? IMO it's asinine. You just force me to create joins down the road (as in reporting), while providing me with no real gain in the here and now. You wanna benchmark your index on an ANPK versus mine on a three-letter alpha column? Ok? Let's go. I'll give you half a second maximum, and then let's look at the rewards I get that you don't. After 30 years in the database business, I have only recently come to appreciate the value of compound PKs and PKs that are not necessarily ANPKs. It's taken me a while to realize this, I confess, and I'm sure that many of you will disagree. But having been in the situation of 500 tables some of which have 50 million rows, and 20 joins are required for some query or other, I now understand that ANPKs are stupid in this situation. Far better, and 10 times faster, are meaningful char PKs such as the State/Province example listed above. In a given table, I already have the value "CA" meaning California. I don't need to reference an ANPK to look up this value -- I already have it. This perspective may make sense only to those who work with millions of rows in hundreds of tables. But that pretty much describes where I work. In these circumstances, I would definitely raise red flags wherever ANPKs are introduced in lookup tables, while defending their place in transaction tables. Arthur On 7/25/07, jwcolby <jwcolby at colbyconsulting.com> wrote: > > Both? > > > John W. Colby > Colby Consulting > www.ColbyConsulting.com > -----Original Message----- > From: accessd-bounces at databaseadvisors.com > [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of John Bartow > Sent: Wednesday, July 25, 2007 12:02 PM > To: 'Access Developers discussion and problem solving' > Subject: Re: [AccessD] Primary Key Best Practices > > emotional response? too much caffiene? ... > ;o) > > -----Original Message----- > From: accessd-bounces at databaseadvisors.com > [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby > > Jurgen, > > Why can't I state it in such reasoned and logical terms? > > John W. Colby > > -- > AccessD mailing list > AccessD at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/accessd > Website: http://www.databaseadvisors.com > > -- > AccessD mailing list > AccessD at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/accessd > Website: http://www.databaseadvisors.com >