[AccessD] Primary Key Best Practices

Jurgen Welz jwelz at hotmail.com
Wed Jul 25 16:05:23 CDT 2007


Arthur:  Interesting comments as performance has always been an issue for 
me.  In recognition of some of those concerns I  evolved an approach that 
oviates some of your concerns, but it is hardly a common approach.

In the case of displaying State/Province in a form or report, I've always 
set the primary record source to the parent table record and used a combo 
bound to the province key to display the province.  This is the approach I 
use for virtually all lookup tables.  I even occasionally use a value list 
row source (gasp!) that is simply a way to limit and validate data that I 
can't bother to store in a two record table.  In the case of forms or 
reports using province or similar data, no fancy query or join processing 
(assuming client/server) is required and the join is functionally performed 
at the combo.

Going beyond this, for nearly all smaller lookup tables (up to a couple 
thousand records), I've found it expedient to pull the common combo data one 
time with a recordset and getrows and fill the combos with callbacks.  For 
static tables like province, this is pulled on demand the first time 
required.  For semi-static data, i check a single record date time stamp 
that indicates the last edit/add to a lookup table.  If the time stamp is 
after my last data load, I refresh the array.

I don't really see the utilty of displaying 'AL, AK, AB, AZ or AR' in a 
user's view of the data.  My preference is to display the combo with all the 
information.  Rather than placing an edit button by the field to pop a list 
of choices or allowing a user to enter anything they want and then run a 
validation routine, a combo keeps a browseable list ready to hand.

By using combos for one record to one lookup joins and subforms for one 
record to many record joins, and in certain cases, listboxes for one to many 
joins, complex query sources for forms and reports are exceedingly rare.  
With a just in time approach for subforms and lists, selecting on a numeric 
PK continues to make sense.  My current 47 user database reports 191 tables 
and performance remains snappy.

Since the original question is regarding best practices and what I do is not 
for everyone, I can see where you have some useful insights.  It would be 
interesting to know what the impact is of one approach over the other in a 
client/server environment as opposed to a file/server environment.  Since I 
am in a mdb only environment, allowing the form controls to process the 
joins limiting data requests to the absolute minimum traffic by caching 
lookup data made the most sense to me.  I had benchmarked a few approaches 
to filling some of my more complex forms using a couple of different data 
designs and I know that AN PKs remain the only game in town for me.

Ciao
Jürgen Welz
Edmonton, Alberta
jwelz at hotmail.com





>From: "Arthur Fuller" <fuller.artful at gmail.com>
>
>Apparently I have a much different view on this subject than most here. My
>view has evolved over years in this business, and changed due to increasing
>complexity of databases.
>
>1. I used to think that ANPKs were the only way to go. Given a dozen or 
>even
>two dozen tables, it made sense. Given 500 tables plus the low cost of disk
>space, I no longer agree with that model. As a consequence, I no longer
>agree that compound PKs are bad that ANPKs are always superior.
>
>2. Any Front End worth its salt can easily pass multiple values to the BE 
>to
>create a row in some related table. This is not difficult, and if the
>related table is something so simple as State/Province, containing about 62
>rows each of which is uniquely identified by its abbreviation (i.e. SK =
>Saskatchewan, WY = Wyoming, FL = Florida), then what is gained by storing
>these as FKs derived from the States/Provinces table? Why not reduce said
>table to two columns, one being char(2) and the other being varchar(20)?
>What is gained by this strategy is that a table does not need to be joined
>in order to obtain the value that interests humans.
>
>3. The aforementioned difficulty cascades to all related tables in the 
>tree.
>Eventually you find yourself creating surrogate indexes on surrogate 
>indexes
>in some related table, and it can cascade from there. With 5 or 6 tables,
>this may not seem an issue, but try 40 tables in a tree and you will get my
>point.
>
>4. I used to be in the school of ANPKs everywhere but I have since 
>departed,
>primarily due to the effects created in rich databases (e.g. large 
>databases
>are a small number of tables with millions of rows; rich databases are a
>large number of tables with relatively few rows each; it's possible but
>seldom occurs that a db is both large and rich). In a rich database, I do
>not have the time to do 20 joins to grab the keys of the related data. I
>want it NOW, not one minute from now. Correctness of results and speed of
>retrieval are the most important tests. Disk space, long ago, was the
>paramount consideration, and caused many of us (including me) to think that
>ANPKs were the solution.
>
>Thanks to a couple of books I read, I have since then shed that illusion.
>That doesn't mean that I don't like ANPKs. I still love and treasure them.
>But I have learned that they don't belong in any  table where the number of
>rows is fewer than 500 or so. It's silly. Disk space is cheap. Take the 
>case
>of a table called State/Provinces, consisting of about 62 rows representing
>the states in USA and the provinces and territories in Canada. All of these
>items can be represented uniquely by a two-letter combination. Why on earth
>would you then introduce an ANPK into this table? It makes NO sense, IMO. 
>It
>rather exemplifies the ridiculous application of a maxim learned long ago
>and adhered to since, without further examination.
>
>Let's go further. Suppose your firm has 100 products all of which are
>uniquely identified by three alpha characters. Why would you bother 
>creating
>an ANPK on said table? IMO it's asinine. You just force me to create joins
>down the road (as in reporting), while providing me with no real gain in 
>the
>here and now. You wanna benchmark your index on an ANPK versus mine on a
>three-letter alpha column? Ok? Let's go. I'll give you half a second
>maximum, and then let's look at the rewards I get that you don't.
>
>After 30 years in the database business, I have only recently come to
>appreciate the value of compound PKs and PKs that are not necessarily 
>ANPKs.
>It's taken me a while to realize this, I confess, and I'm sure that many of
>you will disagree. But having been in the situation of 500 tables some of
>which have 50 million rows, and 20 joins are required for some query or
>other, I now understand that ANPKs are stupid in this situation. Far 
>better,
>and 10 times faster, are meaningful char PKs such as the State/Province
>example listed above. In a given table, I already have the value "CA"
>meaning California. I don't need to reference an ANPK to look up this value
>-- I already have it.
>
>This perspective may make sense only to those who work with millions of 
>rows
>in hundreds of tables. But that pretty much describes where I work. In 
>these
>circumstances, I would definitely raise red flags wherever ANPKs are
>introduced in lookup tables, while defending their place in transaction
>tables.
>
>Arthur

_________________________________________________________________
http://liveearth.msn.com




More information about the AccessD mailing list