Michael Maddison
michael at ddisolutions.com.au
Tue Jun 12 22:37:24 CDT 2007
Hi John, Heres a great explanation I saved from another list.... //starts here// Just wanted to throw in some bits and pieces to the puzzle.. =;o) First off, there's no absolutes here. Like most other stuff, it depends. Though many times one or the other may be considered 'best', it's not the same as 'is always so'. Just wanted to make that clear. Also, there's sometimes a religious aspect to the debate on indexes. Not so much as other topics (eg nulls - to be or not to be), but sometimes there is a hint of it. It doesn't matter, it's just different opinions. Again, there is no absolute truth that applies to all, only those we ourselves choose to like. So.. my point is that some stuff is 'recommended', that is true. But, most important is that we try to make our own choices, because we know our own situations best. To be able to choose wisely, we need to have information. So, it helps to know a bit about how this stuff works, what it does, and sometimes what it doesn't... Consider a lone table, with no indexes at all. This is also known as a 'heap'. Add a nonclustered index (nci), or several nci's. We still have a heap. A 'heap' is basically (in my mind anyway) something that is unordered. What a heap cannot do, is to order itself, or 'defrag' or 'compact' itself, because there is no spoon (..err order) nci's in a heap looks like this - the actual data from the column(s) that is covered by the nci, and a pointer that points to the datapage where the rest of the row is stored. This is important to know. If we add a clustered index (ci) the table is no longer a heap. It's is know ordered by the ci. So there is a physical aspect to the table. Now, we may defrag, resort, reorg or compact the table - that is one thing that a ci is the tool for. A ci looks like this - it contains the actual data from the column(s) that is covered by the ci in it's indexpages - but the leafnode is the entire row. There is no pointer, because none is needed, you're already 'home'. This also makes a 'ci scan' the same as a 'table scan'. Since the leafnode is also the already existing data, a ci takes about 90% less space as an index, compared to a nci. Another aspect is that a ci must always be unique. It's not the same as we can only place one on unique data, we are allowed to place a ci on non-unique data, but... when we do that, an extra uniqifier is created for us, so that internally, the data becomes unique. Thus, a ci is always unique. This is good to know, because if data has to be 'uniquefied', it requires extra space, and the ci then becomes a little wider than if it was created on unique data. Now, suppose we've done the above, what about the nci's we started out with? Well, stuff has happened since the ci came into play. nci's *not* in a heap looks like this - the actual data from the column(s) that is covered by the nci, but the pointer that earlier pointed to the datapage where the rest of the row was stored, is now replaced by the *entire ci* - sort of 'key in key'. This is also important to know. This also tells us that the coice of ci in terms of how wide it is, also affects all our nci's, since they will each and everyone inherit the ci into themselves. In the end this transforms to space, and space transforms to performance. Anyone confused yet? =;o) Is any of this relevant, worth remembering or not? I think it is. It helps us decide what to prioritize when balancing on the edge of 'index-magics'. There's no absolute rights or wrongs here, it's all a tradeoff between 'good' and 'bad' aspects of index-types, coverings, behaviours etc. But, the more we know about the bits, the more educated guesses we can make, and maybe more importantly, we can also provide our intentions and reasoning behind those choices. ...anyways, that's how I understand it... =;o) /Kenneth //to here// cheers Michael M Michael, Thanks for the reply. Let me expand a little here. This table is the 99 million record data set that I imported the other day. The data came with some sort of PK field but it was nothing I created and had no meaning to me. It still exists. After the import, I created a PKID field, integer, autoincrement etc. Somehow I created a primary key on that new integer field which used a clustered index. I did not intend to. I can in fact, if I so desire (and can cause it to happen) delete this PKID field entirely and start over. I created it, it is not integral to my data in any way, it is just a method of setting up a unique record identifier (PK) based on an autoincrement integer. One of the problems that dogs me still is that I cannot do particular things in SQL Server from the "wizards", things like run a view of any complexity. They simply time out (after about 1 minute or so). I can copy and paste the identical SQL from the view into a "new query" in the same database and it will run just fine, even if the query takes 30 minutes to run. It will not time out. I have yet to figure that one out. The impact of that issue however is that if I try to do things like delete a field in a large table (for example), it takes so long that the operation times out, again after about 1 minute. So, I try to delete this clustered index - timeout. Try to delete the PK symbol from the field - timeout. Try to create a new index - works fine. So, I have a new non-clustered index on the PKID field, as well as the clustered index which I cannot delete. It seems redundant to have both, and I certainly do not need a clustered index AFAIK. I really do not understand the clustered thing (sad, I know), I have read several explanations and I just haven't yet wrapped my mind around it. Anyway, other than the fact that I do not need the clustered index (none of my other such huge tables have one, all are non-clustered) and it is taking up space, it seems to not cause me any harm. I was just looking to get rid of it because AFAIK I do not need it (and it takes up room, or so I thought). John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael Maddison Sent: Tuesday, June 12, 2007 10:51 PM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] SQL Server 2005 - long count time John, I don't know why the heap index was quicker but I would seriously hesitate to remove the clustered index. Maybe the rsult of the 1st query was cached when you ran the 2nd query? As the physical data is stored in clustered index order (a clustered index is basically the data) removing it turns the table into a heap, literally. This can have serious consequences for performance particularly on large tables. It can also cause fragmentation problems which again can have performance issues. AFAIK clustered indexes are preferred for all tables with more then a dozen or so rows, of course knowing which column/s to index is a grey area. I recommend you research slow count queries before making such a radical change. cheers Michael M _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com