[AccessD] 2 quick questions

jwcolby jwcolby at colbyconsulting.com
Mon May 30 20:11:39 CDT 2011


Jim,

 >There is nothing stated in any paper on the relational model that says a primary key cannot change.

First, I highly doubt that you have read every paper on the relational model but I will just give 
you that one.  No contest.

Having said that, I am going to take a show of hands and ask:

Does anyone on this list actually care about this statement?  In order for your hand to be counted 
as up, you need to respond saying something like:

Yes, papers on the relational model matter a great deal to me and I do care that nothing in any 
paper says... etc etc.

Just for the record, my hand is not up.

Just for the record, I don't remember anyone saying that it *can't* change, only that it causes 
enormous problems if it does change.  Not the *DATA* changing, but the business rules defining the 
field or fields that define the PK.  And it can cause enormous problems.  Add one field to the 
required set of fields forming a natural PK and all hell breaks loose.

Jim, you need to be having this conversation (relation model) with your academic friends.  They will 
all agree and, with nothing else to say, you can give it a rest.

I asked you for another name for the "autonumber surrogate key, better damned well not call it the 
PK around Jim" object so that we could refer to the "autonumber surrogate key, better damned well 
not call it the PK around Jim" object by whatever you desire so that we could avoid this absolutely 
silly conversation yet another time.

So please, before you leave this thread, figure out what you want to call this "autonumber surrogate 
key, better damned well not call it the PK around Jim" object and tell us.

We (*almost* everyone except yourself) call the "autonumber surrogate key, better damned well not 
call it the PK around Jim" object the PK because Microsoft calls it that.  Click on the "autonumber 
surrogate key, better damned well not call it the PK around Jim" object and click the PK little 
golden key tool and once Microsoft makes the "autonumber surrogate key, better damned well not call 
it the PK around Jim" object the PK (they called it that), they then refer to that as the PK.

We live and work in a Microsoft world.  In SQL Server the exact same thing happens.  Click the 
little golden key tool on the "autonumber surrogate key, better damned well not call it the PK 
around Jim" object and voila, MS calls it the PK.

*YOU* are the one with the problem.  *YOU* need to come up with another name for the "autonumber 
surrogate key, better damned well not call it the PK around Jim" object.

Or better yet, just accept the fact that everyone except you and your professor friends call it the 
PK, and leave it alone.

I for one am really, really *really* tired of the same old same old.

And I for one am telling you, that since Microsoft calls it the PK, so do I.

Given the nature of Wikipedia I am not a huge fan of quoting them however I am going to.

http://en.wikipedia.org/wiki/Surrogate_key

There are at least two definitions of a surrogate:

Surrogate (1) – Hall, Owlett and Codd (1976)
     A surrogate represents an entity in the outside world. The surrogate is internally generated by 
the system but is nevertheless visible to the user or application.
Surrogate (2) – Wieringa and De Jonge (1991)
     A surrogate represents an object in the database itself. The surrogate is internally generated 
by the system and is invisible to the user or application.

http://datawarehouse.ittoolbox.com/documents/surrogate-key-18121

A surrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the primary key to the 
table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

http://databases.about.com/cs/administration/g/primarykey.htm
http://www.agiledata.org/essays/keys.html

1. Common Key Terminology

Let's start by describing some common terminology pertaining to keys and then work through an 
example.  These terms are:

     *      Key. A key is one or more data attributes that uniquely identify an entity.  In a 
physical database a key would be formed of one or more table columns whose value(s) uniquely 
identifies a row within a relational table.
     *      Composite key.  A key that is composed of two or more attributes.
     *      Natural key.  A key that is formed of attributes that already exist in the real world. 
For example, U.S. citizens are issued a Social Security Number (SSN)  that is unique to them (this 
isn't guaranteed to be true, but it's pretty darn close in practice).  SSN could be used as a 
natural key, assuming privacy laws allow it, for a Person entity (assuming the scope of your 
organization is limited to the U.S.).
     *      Surrogate key.  A key with no business meaning.
     *      Candidate key.  An entity type in a logical data model will have zero or more candidate 
keys, also referred to simply as unique identifiers (note: some people don't believe in identifying 
candidate keys in LDMs, so there's no hard and fast rules).  For example, if we only interact with 
American citizens then SSN is one candidate key for the Person entity type and the combination of 
name and phone number (assuming the combination is unique) is potentially a second candidate key. 
Both of these keys are called candidate keys because they are candidates to be chosen as the primary 
key, an alternate key  or perhaps not even a key at all within a physical data model.
     *      Primary key.  The preferred key for an entity type.
     *      Alternate key. Also known as a secondary key, is another unique identifier of a row 
within a table.
     *      Foreign key. One or more attributes in an entity type that represents a key, either 
primary or secondary, in another entity type.

I have to tell you Jim, that what I learned in the real world is precisely the above set of terms.

Since every key uniquely identifies the row...

The PK is nothing more than the most convenient Key, or the key with the fewest potential problems, 
actually selected by the database designer to represent the record.  The word Primary (I assume) was 
selected because it is just that.

As I understand it, the surrogate key did not exist in the (mathematical) relational model, and in 
fact, since it is not a real world property of the object being modeled, is verboten in the 
relational model.

As if I care.

The point Jim is that with all due respects to academia, we live in the real world and we end up 
with a vocabulary (and tools) that often conflicts with academia.  Surrogate keys were invented 
because natural keys were clumsy and a royal PITA and it is entirely unnecessary to actually use a 
natural key as the PK (except in the mathematical relational model where surrogate keys are verboten).

You may argue till the cows come home that if it weren't for reality we wouldn't have surrogate 
keys, and I will say "reality is, and is where *I* live!".

Jim, you are tilting at windmills.  And yes I have read the book from which that metaphor came, 
though I believe it is you that pointed out I still have to call myself uneducated!

John W. Colby
www.ColbyConsulting.com

On 5/29/2011 8:18 PM, Jim Dettman wrote:
> <<Primary Key = Nothing a user can use for anything and is totally not
> dependant on data. For db system use only>>
>
>    Pure hogwash.  The very definition of a PK in the relational model is
> exactly opposite of what you describe.
>
>    John's problem is exactly what I tried to point out on this list a while
> back; there is a fundamental difference between a primary key and a tag or a
> pointer.  John's problem is related to the meaning of the data itself; how
> to determine that a row within the relation is unique.  In other words, John
> is looking for the primary key; that combination of attributes which
> identifies a specific row and ensures that the row is not duplicated.
>
>    That is not the same thing as adding a unique tag or pointer on every row
> (an auto number).  And despite popular belief, a auto number in relational
> terms is not a surrogate key.
>
>    John needs to come up with a key based on the data that will ensure that a
> row is unique.  If the field(s) used to do that cannot ensure that, then
> more fields (attributes) need to be added to the table (relation).
>
>    For *performance* reasons in the DB, he will use an auto number as a
> pointer (like all of us do), but in addition to that, he will need to create
> a unique index based on the PK.
>
>    He might take the shortcut of making that index a hash, which for
> performance reasons may be required, but it's a bad idea because it can lead
> to an update abnormalities.  If you can live with that, great and at the end
> of the day, it may be required to get the job done.
>
>    Since it's only one index on the main table, I wouldn't take that
> shortcut.
>
>    And as far as the primary key not being able to change (as in a name
> change or SS#), again total hogwash.  That's why relational DB's allow for
> cascading updates.  There is nothing stated in any paper on the relational
> model that says a primary key cannot change.
>
> Jim.



More information about the AccessD mailing list