jwcolby
jwcolby at colbyconsulting.com
Wed Jul 25 09:35:27 CDT 2007
>So which you use simply doesn't matter AS LONG AS the PK meets a few criteria. And I forgot to add: 3) Should be efficient, which is where the autoincrement shines. Integers are the basic currency of the modern CPU and as such they are the fastest possible unit of information when doing compares. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Wednesday, July 25, 2007 9:18 AM To: 'Access Developers discussion and problem solving' Subject: Re: [AccessD] Primary Key Best Practices Bruce, >Any table that does not have a natural primary key is not a pure >dataset, ... No, Any table that does not have a natural primary key CANDIDATE is not a pure dataset, ... A candidate key is exactly that, a field or set of fields that can serve the purpose of uniquely identifying the data record, and thus becomes a CANDIDATE to be the PK. In fact you can have multiple candidates. Nothing says you actually have to USE the candidate as the PK. That is again the difference between the unique index and the PK. One of the CANDIDATES is used as the unique index, but a surrogate can still be used as the PK. Take an employee table. ONE candidate key could be first / last / phone, another could be an employee number (a quasi-surrogate key by the way), another could be the SSN. Thus we have (at least) THREE CANDIDATE keys. Do you have to use all of them as the PK? Obviously not. Do you have to use ANY of them as the PK? Obviously not. Throw in a true surrogate key - an autoincrement - and you can use that as well. So which you use simply doesn't matter AS LONG AS the PK meets a few criteria. 1) The PK should be stable (NEVER changes). That eliminates the first / last / phone idea because the name could change (I got married, I got tired of my name) AND the phone number could (and probably will) change. The SSN fails for the same reason. SSNs CAN (believe it or not) change. Alien workers steal SSNs all the time. Now they become a citizen... Oooops the SSN has to change. 2) Has no meaning. From the perspective of the CHILD record, the FK is nothing more than a pointer back to the parent record. As such it should be a behind the scenes, never directly viewed artificial construct designed to do its job as efficiently as possible. In fact, a surrogate key can (and DOES) uniquely identify a single row of the table. If it did not, then it could not be the PK. As you know, you cannot take a simple integer, make it a value of 1 for every record, and make that the PK of the table. It is the very fact that you make the field autoincrement (or whatever) is what makes it possible to serve as the PK. However a surrogate cannot be used for the unique index, the purpose of preventing duplicates. OTOH it does not need to, its sole purpose in live is to act as a pointer, placed in child records to point back to itself. Again, the PK and the unique index are NOT the same structure, either in real life or in theory. The purpose of the PK is to uniquely identify a record so that it can be linked to a child record. Nothing more, nothing less. You can do that with a natural key, and you can do that with a surrogate. The purpose of the unique index is to prevent entering the same record in the database twice (data integrity). Different function. The fact that the fields used in the unique index can also serve as a PK is irrelevant. It does NOT HAVE TO BE the primary key. Think about normalization for a moment. The basic concept behind normalization is that ONLY information about a specific object is in the table about that object. Only bank info is in the bank table, only customer info is in the customer table, only check info is in the check table. One goal of normalization should be to minimize the fluff around lineage. Yes a check was drawn by a specific customer, but that does not mean that we have to have any specific piece of the customer table in the check table, it only means that we have to be able to uniquely identify which customer wrote the check. So it simply does NOT MATTER whether we use a SURROGATE KEY from the customer table, or the customer SSN (the very WORST choice), the first /last / eye color / hair color / phone / zip / and whatever else you might have decided was the CANDIDATE KEY of the customer table. Either one WORKS. What works, and what is the best choice are different matters. The surrogate key avoids a whole slew of real life problems, and creates none (in my experience). John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Bruce Bruen Sent: Wednesday, July 25, 2007 8:12 AM To: Access Developers discussion and problem solving Subject: Re: [AccessD] Primary Key Best Practices On Wednesday 25 July 2007 02:52, jwcolby wrote: > ROTFLMAO. > > The fit never passes. It just subsides until the moon rises again. > > > John W. Colby Any table that does not have a natural primary key is not a pure dataset, ... ... ... no, I'm not talking about natural v surrogate. bruce -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com