Michael Maddison
michael at ddisolutions.com.au
Wed Jun 13 02:20:39 CDT 2007
Hi John, Having a slow day so I decided to test your problem. In my test scenario with 3mill rows the query select count(*) from [testtable] was minutes faster on a heap, which I wouldn't have expected... It seemed to be the only query that was faster though. There is a quick way to get the rowcount on a CI table though. Just a note of caution this method does not give 'guarenteed' results. SELECT rowcnt AS [RowCount] FROM sysindexes WHERE indid IN (1,0) AND OBJECTPROPERTY(id, 'IsUserTable') = 1 AND OBJECT_NAME(id) = 'testtable' cheers Michael M Interesting but also confusing. It sounds like a CI is always preferable but apparently not or there would not be a "religious" aspect to it. But the author does not describe any downsides, or at least in sufficient extent to allow me to discern it as such. If a nci is a separate object somewhere, with pointers to the real object (which describes my understanding if the nci) then is it faster to scan, looking for matches (because it is not a table scan)? IOW, my understanding of a table scan was that each row of data is stored (more less) together on the disk. Thus a "table scan" has to wade through 99 million reads of the entire record, whereas a NCI is a separate structure and thus the data read while "scanning" the index is just the data (an integer in this case) plus the leaf pointer (probably another integer) - less data to read off the disk. In this case, the data in an autonumber integer PK has no meaning, it is nothing more than a pointer to the data, thus all of the arguments for using a ci are moot. Sorting by the autonumber integer PKID is meaningless. Even if the data was originally stored in some order (last name / first name or something) the integer PKID does not reflect that meaning in any way. Defrag? Compact? Are we saying that without a ci the data store with rows missing cannot be physically rearranged on the disk to fill in the holes left buy the deleted records? I find that difficult to believe but what do I know? So in the end, I am left with a slightly better understanding of the issue but not enough to see whether I need one or not. I suspect not. Sigh. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael Maddison Sent: Tuesday, June 12, 2007 11:37 PM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] SQL Server 2005 - long count time Hi John, Heres a great explanation I saved from another list.... //starts here// Just wanted to throw in some bits and pieces to the puzzle.. =;o) First off, there's no absolutes here. Like most other stuff, it depends. Though many times one or the other may be considered 'best', it's not the same as 'is always so'. Just wanted to make that clear. Also, there's sometimes a religious aspect to the debate on indexes. Not so much as other topics (eg nulls - to be or not to be), but sometimes there is a hint of it. It doesn't matter, it's just different opinions. Again, there is no absolute truth that applies to all, only those we ourselves choose to like. So.. my point is that some stuff is 'recommended', that is true. But, most important is that we try to make our own choices, because we know our own situations best. To be able to choose wisely, we need to have information. So, it helps to know a bit about how this stuff works, what it does, and sometimes what it doesn't... Consider a lone table, with no indexes at all. This is also known as a 'heap'. Add a nonclustered index (nci), or several nci's. We still have a heap. A 'heap' is basically (in my mind anyway) something that is unordered. What a heap cannot do, is to order itself, or 'defrag' or 'compact' itself, because there is no spoon (..err order) nci's in a heap looks like this - the actual data from the column(s) that is covered by the nci, and a pointer that points to the datapage where the rest of the row is stored. This is important to know. If we add a clustered index (ci) the table is no longer a heap. It's is know ordered by the ci. So there is a physical aspect to the table. Now, we may defrag, resort, reorg or compact the table - that is one thing that a ci is the tool for. A ci looks like this - it contains the actual data from the column(s) that is covered by the ci in it's indexpages - but the leafnode is the entire row. There is no pointer, because none is needed, you're already 'home'. This also makes a 'ci scan' the same as a 'table scan'. Since the leafnode is also the already existing data, a ci takes about 90% less space as an index, compared to a nci. Another aspect is that a ci must always be unique. It's not the same as we can only place one on unique data, we are allowed to place a ci on non-unique data, but... when we do that, an extra uniqifier is created for us, so that internally, the data becomes unique. Thus, a ci is always unique. This is good to know, because if data has to be 'uniquefied', it requires extra space, and the ci then becomes a little wider than if it was created on unique data. Now, suppose we've done the above, what about the nci's we started out with? Well, stuff has happened since the ci came into play. nci's *not* in a heap looks like this - the actual data from the column(s) that is covered by the nci, but the pointer that earlier pointed to the datapage where the rest of the row was stored, is now replaced by the *entire ci* - sort of 'key in key'. This is also important to know. This also tells us that the coice of ci in terms of how wide it is, also affects all our nci's, since they will each and everyone inherit the ci into themselves. In the end this transforms to space, and space transforms to performance. Anyone confused yet? =;o) Is any of this relevant, worth remembering or not? I think it is. It helps us decide what to prioritize when balancing on the edge of 'index-magics'. There's no absolute rights or wrongs here, it's all a tradeoff between 'good' and 'bad' aspects of index-types, coverings, behaviours etc. But, the more we know about the bits, the more educated guesses we can make, and maybe more importantly, we can also provide our intentions and reasoning behind those choices. ...anyways, that's how I understand it... =;o) /Kenneth //to here// cheers Michael M Michael, Thanks for the reply. Let me expand a little here. This table is the 99 million record data set that I imported the other day. The data came with some sort of PK field but it was nothing I created and had no meaning to me. It still exists. After the import, I created a PKID field, integer, autoincrement etc. Somehow I created a primary key on that new integer field which used a clustered index. I did not intend to. I can in fact, if I so desire (and can cause it to happen) delete this PKID field entirely and start over. I created it, it is not integral to my data in any way, it is just a method of setting up a unique record identifier (PK) based on an autoincrement integer. One of the problems that dogs me still is that I cannot do particular things in SQL Server from the "wizards", things like run a view of any complexity. They simply time out (after about 1 minute or so). I can copy and paste the identical SQL from the view into a "new query" in the same database and it will run just fine, even if the query takes 30 minutes to run. It will not time out. I have yet to figure that one out. The impact of that issue however is that if I try to do things like delete a field in a large table (for example), it takes so long that the operation times out, again after about 1 minute. So, I try to delete this clustered index - timeout. Try to delete the PK symbol from the field - timeout. Try to create a new index - works fine. So, I have a new non-clustered index on the PKID field, as well as the clustered index which I cannot delete. It seems redundant to have both, and I certainly do not need a clustered index AFAIK. I really do not understand the clustered thing (sad, I know), I have read several explanations and I just haven't yet wrapped my mind around it. Anyway, other than the fact that I do not need the clustered index (none of my other such huge tables have one, all are non-clustered) and it is taking up space, it seems to not cause me any harm. I was just looking to get rid of it because AFAIK I do not need it (and it takes up room, or so I thought). John W. Colby Colby Consulting www.ColbyConsulting.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com