[AccessD] Primary Key Best Practices

Wed Jul 25 09:35:27 CDT 2007

>So which you use simply doesn't matter AS LONG AS the PK meets a few
criteria.

And I forgot to add:

3) Should be efficient, which is where the autoincrement shines.  Integers
are the basic currency of the modern CPU and as such they are the fastest
possible unit of information when doing compares. 

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Wednesday, July 25, 2007 9:18 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] Primary Key Best Practices

Bruce, 

>Any table that does not have a natural primary key is not a pure 
>dataset,
...

No, Any table that does not have a natural primary key CANDIDATE is not a
pure dataset, ... 

A candidate key is exactly that, a field or set of fields that can serve the
purpose of uniquely identifying the data record, and thus becomes a
CANDIDATE to be the PK.  In fact you can have multiple candidates.  Nothing
says you actually have to USE the candidate as the PK.  That is again the
difference between the unique index and the PK.  One of the CANDIDATES is
used as the unique index, but a surrogate can still be used as the PK.

Take an employee table.  ONE candidate key could be first / last / phone,
another could be an employee number (a quasi-surrogate key by the way),
another could be the SSN.  Thus we have (at least) THREE CANDIDATE keys.  Do
you have to use all of them as the PK?  Obviously not.  Do you have to use
ANY of them as the PK?  Obviously not.  Throw in a true surrogate key - an
autoincrement - and you can use that as well.  So which you use simply
doesn't matter AS LONG AS the PK meets a few criteria.

1) The PK should be stable (NEVER changes).  

That eliminates the first / last / phone idea because the name could change
(I got married, I got tired of my name) AND the phone number could (and
probably will) change.  The SSN fails for the same reason.  SSNs CAN
(believe it or not) change.  Alien workers steal SSNs all the time.  Now
they become a citizen...  Oooops the SSN has to change.

2) Has no meaning.  From the perspective of the CHILD record, the FK is
nothing more than a pointer back to the parent record.  As such it should be
a behind the scenes, never directly viewed artificial construct designed to
do its job as efficiently as possible.

In fact, a surrogate key can (and DOES) uniquely identify a single row of
the table.  If it did not, then it could not be the PK.  As you know, you
cannot take a simple integer, make it a value of 1 for every record, and
make that the PK of the table.  It is the very fact that you make the field
autoincrement (or whatever) is what makes it possible to serve as the PK.
However a surrogate cannot be used for the unique index, the purpose of
preventing duplicates.  OTOH it does not need to, its sole purpose in live
is to act as a pointer, placed in child records to point back to itself.

Again, the PK and the unique index are NOT the same structure, either in
real life or in theory.  The purpose of the PK is to uniquely identify a
record so that it can be linked to a child record.  Nothing more, nothing
less.  You can do that with a natural key, and you can do that with a
surrogate.  The purpose of the unique index is to prevent entering the same
record in the database twice (data integrity).  Different function.  The
fact that the fields used in the unique index can also serve as a PK is
irrelevant.  It does NOT HAVE TO BE the primary key.

Think about normalization for a moment.  The basic concept behind
normalization is that ONLY information about a specific object is in the
table about that object.  Only bank info is in the bank table, only customer
info is in the customer table, only check info is in the check table.  One
goal of normalization should be to minimize the fluff around lineage.  Yes a
check was drawn by a specific customer, but that does not mean that we have
to have any specific piece of the customer table in the check table, it only
means that we have to be able to uniquely identify which customer wrote the
check.  So it simply does NOT MATTER whether we use a SURROGATE KEY from the
customer table, or the customer SSN (the very WORST choice), the first /last
/ eye color / hair color / phone / zip / and whatever else you might have
decided was the CANDIDATE KEY of the customer table.  Either one WORKS.  

What works, and what is the best choice are different matters.  The
surrogate key avoids a whole slew of real life problems, and creates none
(in my experience).

John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Bruce Bruen
Sent: Wednesday, July 25, 2007 8:12 AM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] Primary Key Best Practices

On Wednesday 25 July 2007 02:52, jwcolby wrote:
> ROTFLMAO.
>
> The fit never passes.  It just subsides until the moon rises again.
>
>
> John W. Colby

Any table that does not have a natural primary key is not a pure dataset,
...

...

...

no, I'm not talking about natural v surrogate.

bruce
--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com