[dba-SQLServer] Clustered index

Sun Dec 9 03:50:28 CST 2007

Arthur,

>If this sort of thing is not important 

The reason I am doing this is that:

1) Once created this particular table will never change, EVER.  We don't get
updates to this one.
2) I am endlessly doing counts and the like based on WHERE clauses.  WHERE
INCOME IN ('D','E','F') etc.

>From my reading, the point of a clustered index is that:

a) Once SQL Server finds the first of a series (in this case the
FieldNameID) it now knows that all the rest of the records for that field
are contiguous
b) The data is right there in the index
c) That when it gets to the END of a given part of the index it can quit
looking - there will not be any FieldNameIDs on down the road because they
are all contiguous.

Thus to find Field Income it can scan down the index until it finds the
first integer value 111 in that field.  It has now found the values (actual
data for) the first Income field and all the rest of the Income fields are
coming next.  It starts looking for DATA values of 'D'.  Once it finds the
first DATA value of D, all the rest are coming up next.  It can start
gathering the PKIDs for FieldNameID 111 and Data values 'D'.  Once it hits
the LAST data value 'D' it knows that it can stop searching for 111 / 'D'
because all the values are contiguous and it just found the last one.  It
can now start looking for 111 / 'E'.  It is still in 111 (and they are
contiguous) so it is already in the right area etc.

>From my reading, the "contiguous" part is the key here.  It allows SQL
Server to STOP the search once the last value is found for a particular
field because it knows that there will be no more scattered around in the
remaining records.  Furthermore, the fact that the data is right there is
the next key to the efficiency.  Once 111 / 'E' is found, it can just grab
the PKID and stuff it in the result set since it doesn't have to do any more
leaf searching to find the actual data.

So it SOUNDS like a clustered index should make this thing as fast as I can
get for situations where there no updates to spoil the party, and there are
none for this specific table.  The data is entirely static.

Given that the data is now only three fields total - FieldNameID, VALUE and
PKID, a clustered covering index is doable and sounds ideal.  I get to index
every single "field" in the old "wide" table with a single covering index.

It also turns out that since I have been doing business with Stan he has
always asked for information from fields 084-233 of the table.  These fields
are physically grouped into two "vertical" tables - 084-143 and 144-233,
with about 500 million data points in the first (084-143).  The sec
(144-233) is still processing and I cant get the properties for it yet but I
suspect that it will be around twice that number of points.  Thus out of a
total of 11 Gig data points I will only have to search through 1.5 Gig
points to answer all the questions so far encountered.

If this works, I will owe you a bottle of your favorite whatever.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Arthur
Fuller
Sent: Sunday, December 09, 2007 12:17 AM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Clustered index

A clustered index physically re-orders the table into the specified
sequence. Given the size of your table, this may take some time.

The point of a clustered index (AFAIK) is to group objects in contiguous
proximity for the most rapid retrieval of proximate objects. For example,
given Customers and Orders, a covering index might be clustered so as to
guarantee that Customer X's Orders are available within a single page or
perhaps two, thus causing minimal disk hits.

If this sort of thing is not important, then clustered indexes are beside
the point, and create needless index-creation time.

A.

On 12/8/07, jwcolby <jwcolby at colbyconsulting.com> wrote:
>
> I have a table with physical fields PKID, FldNameID, Data.  I want to 
> create a clustered index on that based on the field order FldNameID, 
> Data, PKID.
> Is the index based on the physical order of the fields or the order of 
> the fields specified when you create the index?
>
> IOW I went into the widget that creates indexes, specified the fields 
> in the order I desire them in the index, and checked the Clustered 
> check box.  It certainly appears that the index will create by the 
> order I specified them in the index widget (which is what I want).
>
> But is that true?
>
> John W. Colby
> Colby Consulting
> www.ColbyConsulting.com