[dba-SQLServer] How does a clustered index really work

Fri Oct 30 14:40:37 CDT 2009

I need more information about how a clustered index really works.  I work with some fairly large 
tables, in this specific case 67 million records.  The table has a PKID (Identity) as well as a PK 
from another table (foreign key).  It then has name / address fields, plus about 8 non name / 
address fields.

I have come to the conclusion that I need to update this table about once a month or so (address 
validation stuff).  The updates will affect a smallish subset of the total records, perhaps a 
million or so records, maybe less (about 2% of the total records, once a month).

I have read that using a clustered index speeds up any operations that pull data ONLY from fields in 
the clustered index because of the way the data is stored on disk.  My understanding is that those 
fields are stored contiguously, whereas other indexed fields are stored in leaf nodes in a b-tree. 
I  understand a b-tree, though the knowledge is old.

Because of some of the processes I run I am wondering whether I should take the hit of placing the 
PK/FK fields plus the name / address fields in a clustered index.  I use just those fields a LOT. 
There are some additional fields that I use sporadically.  Other than the PK/FK  and NAME fields, 
all the other fields may be updated, though as I mentioned, rarely more than about 2% of the total 
records.

My understanding is that IF the changes in the data in the clustered index cause the ORDER of the 
data to change, then the updates to those fields would be slow, but if the order never changes then 
clustered index field updates aren't particularly slow.

Am I right so far?

Because the first field in the clustered index would be the Identity field, the ORDER of the records 
will NEVER change.  Thus my read is that I can safely store any of the other fields I want without 
worrying about causing that particular performance hit on updates.

Question, because of the storage method on disk, are the updates to clustered index fields FASTER 
than they would be otherwise, IOW getting at the data is faster, the order doesn't change so the 
update proceeds faster?

I am a SQL Server wannabe.  I use it a LOT but at the same time I never see myself becoming an 
expert, never mind a guru.  Is there something I can read about this subject that addresses my 
level, that I could fairly easily get my questions about this answered?

TIA for your patient responses.  ;)

-- 
John W. Colby
www.ColbyConsulting.com