[AccessD] Primary Key Best Practices

Wed Jul 25 12:46:29 CDT 2007

And you know what Arthur, your logic is flawless.  There are a number of
cases where there is only a single field in the table and only ever going to
be a single field.  In those cases I would reluctantly put down my ANPK.  IN
THOSE cases, the table is really being used as a simple data integrity
monitor to force the user to select a valid choice.  The two column "with a
code" I would not so quickly put down my ANPK.  In the case of a product
with a 3 char code and a ton of other useful info I absolutely would
continue to use an ANPK.  The reason is simple, and comes back to join
speed.  Yea, people MAY memorize all the 3 character codes but just as
likely you would be joining your 3 char field to pull more data where I
would be joining an integer.

But congrats anyway, you have pointed out the (so far) single case where a
natural key is as good as an ANPK and causes no further problems.

And BTW, disk space is not my primary objection, but rather join speed.  No
matter whether a lowly PC desktop or a supercomputer, speed is always an
issue.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Arthur Fuller
Sent: Wednesday, July 25, 2007 1:12 PM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] Primary Key Best Practices

Apparently I have a much different view on this subject than most here. My
view has evolved over years in this business, and changed due to increasing
complexity of databases.

1. I used to think that ANPKs were the only way to go. Given a dozen or even
two dozen tables, it made sense. Given 500 tables plus the low cost of disk
space, I no longer agree with that model. As a consequence, I no longer
agree that compound PKs are bad that ANPKs are always superior.

2. Any Front End worth its salt can easily pass multiple values to the BE to
create a row in some related table. This is not difficult, and if the
related table is something so simple as State/Province, containing about 62
rows each of which is uniquely identified by its abbreviation (i.e. SK =
Saskatchewan, WY = Wyoming, FL = Florida), then what is gained by storing
these as FKs derived from the States/Provinces table? Why not reduce said
table to two columns, one being char(2) and the other being varchar(20)?
What is gained by this strategy is that a table does not need to be joined
in order to obtain the value that interests humans.

3. The aforementioned difficulty cascades to all related tables in the tree.
Eventually you find yourself creating surrogate indexes on surrogate indexes
in some related table, and it can cascade from there. With 5 or 6 tables,
this may not seem an issue, but try 40 tables in a tree and you will get my
point.

4. I used to be in the school of ANPKs everywhere but I have since departed,
primarily due to the effects created in rich databases (e.g. large databases
are a small number of tables with millions of rows; rich databases are a
large number of tables with relatively few rows each; it's possible but
seldom occurs that a db is both large and rich). In a rich database, I do
not have the time to do 20 joins to grab the keys of the related data. I
want it NOW, not one minute from now. Correctness of results and speed of
retrieval are the most important tests. Disk space, long ago, was the
paramount consideration, and caused many of us (including me) to think that
ANPKs were the solution.

Thanks to a couple of books I read, I have since then shed that illusion.
That doesn't mean that I don't like ANPKs. I still love and treasure them.
But I have learned that they don't belong in any  table where the number of
rows is fewer than 500 or so. It's silly. Disk space is cheap. Take the case
of a table called State/Provinces, consisting of about 62 rows representing
the states in USA and the provinces and territories in Canada. All of these
items can be represented uniquely by a two-letter combination. Why on earth
would you then introduce an ANPK into this table? It makes NO sense, IMO. It
rather exemplifies the ridiculous application of a maxim learned long ago
and adhered to since, without further examination.

Let's go further. Suppose your firm has 100 products all of which are
uniquely identified by three alpha characters. Why would you bother creating
an ANPK on said table? IMO it's asinine. You just force me to create joins
down the road (as in reporting), while providing me with no real gain in the
here and now. You wanna benchmark your index on an ANPK versus mine on a
three-letter alpha column? Ok? Let's go. I'll give you half a second
maximum, and then let's look at the rewards I get that you don't.

After 30 years in the database business, I have only recently come to
appreciate the value of compound PKs and PKs that are not necessarily ANPKs.
It's taken me a while to realize this, I confess, and I'm sure that many of
you will disagree. But having been in the situation of 500 tables some of
which have 50 million rows, and 20 joins are required for some query or
other, I now understand that ANPKs are stupid in this situation. Far better,
and 10 times faster, are meaningful char PKs such as the State/Province
example listed above. In a given table, I already have the value "CA"
meaning California. I don't need to reference an ANPK to look up this value
-- I already have it.

This perspective may make sense only to those who work with millions of rows
in hundreds of tables. But that pretty much describes where I work. In these
circumstances, I would definitely raise red flags wherever ANPKs are
introduced in lookup tables, while defending their place in transaction
tables.

Arthur