[dba-SQLServer] Clustered index

Arthur Fuller fuller.artful at gmail.com
Sun Dec 9 12:11:31 CST 2007


You're right about the nature and function of clustered indexes. I think it
will work, John, and that your search times will diminish dramatically.

A.

On 12/9/07, jwcolby <jwcolby at colbyconsulting.com> wrote:
>
> Arthur,
>
> >If this sort of thing is not important
>
> The reason I am doing this is that:
>
> 1) Once created this particular table will never change, EVER.  We don't
> get
> updates to this one.
> 2) I am endlessly doing counts and the like based on WHERE clauses.  WHERE
> INCOME IN ('D','E','F') etc.
>
> >From my reading, the point of a clustered index is that:
>
> a) Once SQL Server finds the first of a series (in this case the
> FieldNameID) it now knows that all the rest of the records for that field
> are contiguous
> b) The data is right there in the index
> c) That when it gets to the END of a given part of the index it can quit
> looking - there will not be any FieldNameIDs on down the road because they
> are all contiguous.
>
> Thus to find Field Income it can scan down the index until it finds the
> first integer value 111 in that field.  It has now found the values
> (actual
> data for) the first Income field and all the rest of the Income fields are
> coming next.  It starts looking for DATA values of 'D'.  Once it finds the
> first DATA value of D, all the rest are coming up next.  It can start
> gathering the PKIDs for FieldNameID 111 and Data values 'D'.  Once it hits
> the LAST data value 'D' it knows that it can stop searching for 111 / 'D'
> because all the values are contiguous and it just found the last one.  It
> can now start looking for 111 / 'E'.  It is still in 111 (and they are
> contiguous) so it is already in the right area etc.
>
> >From my reading, the "contiguous" part is the key here.  It allows SQL
> Server to STOP the search once the last value is found for a particular
> field because it knows that there will be no more scattered around in the
> remaining records.  Furthermore, the fact that the data is right there is
> the next key to the efficiency.  Once 111 / 'E' is found, it can just grab
> the PKID and stuff it in the result set since it doesn't have to do any
> more
> leaf searching to find the actual data.
>
> So it SOUNDS like a clustered index should make this thing as fast as I
> can
> get for situations where there no updates to spoil the party, and there
> are
> none for this specific table.  The data is entirely static.
>
> Given that the data is now only three fields total - FieldNameID, VALUE
> and
> PKID, a clustered covering index is doable and sounds ideal.  I get to
> index
> every single "field" in the old "wide" table with a single covering index.
>
> It also turns out that since I have been doing business with Stan he has
> always asked for information from fields 084-233 of the table.  These
> fields
> are physically grouped into two "vertical" tables - 084-143 and 144-233,
> with about 500 million data points in the first (084-143).  The sec
> (144-233) is still processing and I cant get the properties for it yet but
> I
> suspect that it will be around twice that number of points.  Thus out of a
> total of 11 Gig data points I will only have to search through 1.5 Gig
> points to answer all the questions so far encountered.
>
> If this works, I will owe you a bottle of your favorite whatever.
>
> John W. Colby
> Colby Consulting
> www.ColbyConsulting.com
>



More information about the dba-SQLServer mailing list