Arthur Fuller
fuller.artful at gmail.com
Sun Dec 9 12:11:31 CST 2007
You're right about the nature and function of clustered indexes. I think it will work, John, and that your search times will diminish dramatically. A. On 12/9/07, jwcolby <jwcolby at colbyconsulting.com> wrote: > > Arthur, > > >If this sort of thing is not important > > The reason I am doing this is that: > > 1) Once created this particular table will never change, EVER. We don't > get > updates to this one. > 2) I am endlessly doing counts and the like based on WHERE clauses. WHERE > INCOME IN ('D','E','F') etc. > > >From my reading, the point of a clustered index is that: > > a) Once SQL Server finds the first of a series (in this case the > FieldNameID) it now knows that all the rest of the records for that field > are contiguous > b) The data is right there in the index > c) That when it gets to the END of a given part of the index it can quit > looking - there will not be any FieldNameIDs on down the road because they > are all contiguous. > > Thus to find Field Income it can scan down the index until it finds the > first integer value 111 in that field. It has now found the values > (actual > data for) the first Income field and all the rest of the Income fields are > coming next. It starts looking for DATA values of 'D'. Once it finds the > first DATA value of D, all the rest are coming up next. It can start > gathering the PKIDs for FieldNameID 111 and Data values 'D'. Once it hits > the LAST data value 'D' it knows that it can stop searching for 111 / 'D' > because all the values are contiguous and it just found the last one. It > can now start looking for 111 / 'E'. It is still in 111 (and they are > contiguous) so it is already in the right area etc. > > >From my reading, the "contiguous" part is the key here. It allows SQL > Server to STOP the search once the last value is found for a particular > field because it knows that there will be no more scattered around in the > remaining records. Furthermore, the fact that the data is right there is > the next key to the efficiency. Once 111 / 'E' is found, it can just grab > the PKID and stuff it in the result set since it doesn't have to do any > more > leaf searching to find the actual data. > > So it SOUNDS like a clustered index should make this thing as fast as I > can > get for situations where there no updates to spoil the party, and there > are > none for this specific table. The data is entirely static. > > Given that the data is now only three fields total - FieldNameID, VALUE > and > PKID, a clustered covering index is doable and sounds ideal. I get to > index > every single "field" in the old "wide" table with a single covering index. > > It also turns out that since I have been doing business with Stan he has > always asked for information from fields 084-233 of the table. These > fields > are physically grouped into two "vertical" tables - 084-143 and 144-233, > with about 500 million data points in the first (084-143). The sec > (144-233) is still processing and I cant get the properties for it yet but > I > suspect that it will be around twice that number of points. Thus out of a > total of 11 Gig data points I will only have to search through 1.5 Gig > points to answer all the questions so far encountered. > > If this works, I will owe you a bottle of your favorite whatever. > > John W. Colby > Colby Consulting > www.ColbyConsulting.com >