jwcolby
jwcolby at colbyconsulting.com
Sun Dec 9 03:50:28 CST 2007
Arthur, >If this sort of thing is not important The reason I am doing this is that: 1) Once created this particular table will never change, EVER. We don't get updates to this one. 2) I am endlessly doing counts and the like based on WHERE clauses. WHERE INCOME IN ('D','E','F') etc. >From my reading, the point of a clustered index is that: a) Once SQL Server finds the first of a series (in this case the FieldNameID) it now knows that all the rest of the records for that field are contiguous b) The data is right there in the index c) That when it gets to the END of a given part of the index it can quit looking - there will not be any FieldNameIDs on down the road because they are all contiguous. Thus to find Field Income it can scan down the index until it finds the first integer value 111 in that field. It has now found the values (actual data for) the first Income field and all the rest of the Income fields are coming next. It starts looking for DATA values of 'D'. Once it finds the first DATA value of D, all the rest are coming up next. It can start gathering the PKIDs for FieldNameID 111 and Data values 'D'. Once it hits the LAST data value 'D' it knows that it can stop searching for 111 / 'D' because all the values are contiguous and it just found the last one. It can now start looking for 111 / 'E'. It is still in 111 (and they are contiguous) so it is already in the right area etc. >From my reading, the "contiguous" part is the key here. It allows SQL Server to STOP the search once the last value is found for a particular field because it knows that there will be no more scattered around in the remaining records. Furthermore, the fact that the data is right there is the next key to the efficiency. Once 111 / 'E' is found, it can just grab the PKID and stuff it in the result set since it doesn't have to do any more leaf searching to find the actual data. So it SOUNDS like a clustered index should make this thing as fast as I can get for situations where there no updates to spoil the party, and there are none for this specific table. The data is entirely static. Given that the data is now only three fields total - FieldNameID, VALUE and PKID, a clustered covering index is doable and sounds ideal. I get to index every single "field" in the old "wide" table with a single covering index. It also turns out that since I have been doing business with Stan he has always asked for information from fields 084-233 of the table. These fields are physically grouped into two "vertical" tables - 084-143 and 144-233, with about 500 million data points in the first (084-143). The sec (144-233) is still processing and I cant get the properties for it yet but I suspect that it will be around twice that number of points. Thus out of a total of 11 Gig data points I will only have to search through 1.5 Gig points to answer all the questions so far encountered. If this works, I will owe you a bottle of your favorite whatever. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Arthur Fuller Sent: Sunday, December 09, 2007 12:17 AM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Clustered index A clustered index physically re-orders the table into the specified sequence. Given the size of your table, this may take some time. The point of a clustered index (AFAIK) is to group objects in contiguous proximity for the most rapid retrieval of proximate objects. For example, given Customers and Orders, a covering index might be clustered so as to guarantee that Customer X's Orders are available within a single page or perhaps two, thus causing minimal disk hits. If this sort of thing is not important, then clustered indexes are beside the point, and create needless index-creation time. A. On 12/8/07, jwcolby <jwcolby at colbyconsulting.com> wrote: > > I have a table with physical fields PKID, FldNameID, Data. I want to > create a clustered index on that based on the field order FldNameID, > Data, PKID. > Is the index based on the physical order of the fields or the order of > the fields specified when you create the index? > > IOW I went into the widget that creates indexes, specified the fields > in the order I desire them in the index, and checked the Clustered > check box. It certainly appears that the index will create by the > order I specified them in the index widget (which is what I want). > > But is that true? > > John W. Colby > Colby Consulting > www.ColbyConsulting.com