[AccessD] Primary Key Best Practices

Wed Jul 25 21:55:19 CDT 2007

>From a bound front end, absolutely agree.  There's no reason to have
autonumbers in a lookup table...

Until you go unbound.  When dealing with a lookup table, there's no
point in creating data integrity logic to deal with multiple people
changing the same record.  In a lot of cases, the change may be the same
thing, but if the system thinks the pk is 'FL', and one person changes
it to FLA, when another tries to change another field in the same
record...you get jumbled.  In a bound front end, not a problem, Access
is going to lock the records for you.  In an unbound solution, an
autonumber primary key also fixes the problem.  This is not to say that
the key is used as a foreign key in other tables, it is strictly to
maintain a unique id for a record.

Drew

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Arthur Fuller
Sent: Wednesday, July 25, 2007 12:12 PM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] Primary Key Best Practices

Apparently I have a much different view on this subject than most here.
My
view has evolved over years in this business, and changed due to
increasing
complexity of databases.

1. I used to think that ANPKs were the only way to go. Given a dozen or
even
two dozen tables, it made sense. Given 500 tables plus the low cost of
disk
space, I no longer agree with that model. As a consequence, I no longer
agree that compound PKs are bad that ANPKs are always superior.

2. Any Front End worth its salt can easily pass multiple values to the
BE to
create a row in some related table. This is not difficult, and if the
related table is something so simple as State/Province, containing about
62
rows each of which is uniquely identified by its abbreviation (i.e. SK =
Saskatchewan, WY = Wyoming, FL = Florida), then what is gained by
storing
these as FKs derived from the States/Provinces table? Why not reduce
said
table to two columns, one being char(2) and the other being varchar(20)?
What is gained by this strategy is that a table does not need to be
joined
in order to obtain the value that interests humans.

3. The aforementioned difficulty cascades to all related tables in the
tree.
Eventually you find yourself creating surrogate indexes on surrogate
indexes
in some related table, and it can cascade from there. With 5 or 6
tables,
this may not seem an issue, but try 40 tables in a tree and you will get
my
point.

4. I used to be in the school of ANPKs everywhere but I have since
departed,
primarily due to the effects created in rich databases (e.g. large
databases
are a small number of tables with millions of rows; rich databases are a
large number of tables with relatively few rows each; it's possible but
seldom occurs that a db is both large and rich). In a rich database, I
do
not have the time to do 20 joins to grab the keys of the related data. I
want it NOW, not one minute from now. Correctness of results and speed
of
retrieval are the most important tests. Disk space, long ago, was the
paramount consideration, and caused many of us (including me) to think
that
ANPKs were the solution.

Thanks to a couple of books I read, I have since then shed that
illusion.
That doesn't mean that I don't like ANPKs. I still love and treasure
them.
But I have learned that they don't belong in any  table where the number
of
rows is fewer than 500 or so. It's silly. Disk space is cheap. Take the
case
of a table called State/Provinces, consisting of about 62 rows
representing
the states in USA and the provinces and territories in Canada. All of
these
items can be represented uniquely by a two-letter combination. Why on
earth
would you then introduce an ANPK into this table? It makes NO sense,
IMO. It
rather exemplifies the ridiculous application of a maxim learned long
ago
and adhered to since, without further examination.

Let's go further. Suppose your firm has 100 products all of which are
uniquely identified by three alpha characters. Why would you bother
creating
an ANPK on said table? IMO it's asinine. You just force me to create
joins
down the road (as in reporting), while providing me with no real gain in
the
here and now. You wanna benchmark your index on an ANPK versus mine on a
three-letter alpha column? Ok? Let's go. I'll give you half a second
maximum, and then let's look at the rewards I get that you don't.

After 30 years in the database business, I have only recently come to
appreciate the value of compound PKs and PKs that are not necessarily
ANPKs.
It's taken me a while to realize this, I confess, and I'm sure that many
of
you will disagree. But having been in the situation of 500 tables some
of
which have 50 million rows, and 20 joins are required for some query or
other, I now understand that ANPKs are stupid in this situation. Far
better,
and 10 times faster, are meaningful char PKs such as the State/Province
example listed above. In a given table, I already have the value "CA"
meaning California. I don't need to reference an ANPK to look up this
value
-- I already have it.

This perspective may make sense only to those who work with millions of
rows
in hundreds of tables. But that pretty much describes where I work. In
these
circumstances, I would definitely raise red flags wherever ANPKs are
introduced in lookup tables, while defending their place in transaction
tables.

Arthur
The information contained in this transmission is intended only for the person or entity to which it is addressed and may contain II-VI Proprietary and/or II-VI BusinessSensitve material. If you are not the intended recipient, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. You are notified that any review, retransmission, copying, disclosure, dissemination, or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited.