[dba-SQLServer] SQL Server 2005 - long count time

jwcolby jwcolby at colbyconsulting.com
Tue Jun 12 23:02:22 CDT 2007


Interesting but also confusing.  It sounds like a CI is always preferable
but apparently not or there would not be a "religious" aspect to it.  But
the author does not describe any downsides, or at least in sufficient extent
to allow me to discern it as such. 

If a nci is a separate object somewhere, with pointers to the real object
(which describes my understanding if the nci) then is it faster to scan,
looking for matches (because it is not a table scan)?  IOW, my understanding
of a table scan was that each row of data is stored (more less) together on
the disk.  Thus a "table scan" has to wade through 99 million reads of the
entire record, whereas a NCI is a separate structure and thus the data read
while "scanning" the index is just the data (an integer in this case) plus
the leaf pointer (probably another integer) - less data to read off the
disk.

In this case, the data in an autonumber integer PK has no meaning, it is
nothing more than a pointer to the data, thus all of the arguments for using
a ci are moot.  Sorting by the autonumber integer PKID is meaningless.  Even
if the data was originally stored in some order (last name / first name or
something) the integer PKID does not reflect that meaning in any way.

Defrag?  Compact?  Are we saying that without a ci the data store with rows
missing cannot be physically rearranged on the disk to fill in the holes
left buy the deleted records?  I find that difficult to believe but what do
I know?

So in the end, I am left with a slightly better understanding of the issue
but not enough to see whether I need one or not.  I suspect not.

Sigh.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael
Maddison
Sent: Tuesday, June 12, 2007 11:37 PM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] SQL Server 2005 - long count time

Hi John,

Heres a great explanation I saved from another list....

//starts here//
Just wanted to throw in some bits and pieces to the puzzle.. =;o)
 
First off, there's no absolutes here. Like most other stuff, it depends.
Though many times one or the other may be considered 'best', it's not the
same as 'is always so'. Just wanted to make that clear. 
 
Also, there's sometimes a religious aspect to the debate on indexes. Not so
much as other topics (eg nulls - to be or not to be), but sometimes there is
a hint of it. It doesn't matter, it's just different opinions.
Again, there is no absolute truth that applies to all, only those we
ourselves choose to like. 
 
So..  my point is that some stuff is 'recommended', that is true. But, most
important is that we try to make our own choices, because we know our own
situations best. To be able to choose wisely, we need to have information.
So, it helps to know a bit about how this stuff works, what it does, and
sometimes what it doesn't... 
 
Consider a lone table, with no indexes at all. This is also known as a
'heap'.
Add a nonclustered index (nci), or several nci's. We still have a heap.
A 'heap' is basically (in my mind anyway) something that is unordered.
What a heap cannot do, is to order itself, or 'defrag' or 'compact'
itself, because there is no spoon (..err order)
 
nci's in a heap looks like this - the actual data from the column(s) that is
covered by the nci, and a pointer that points to the datapage where the rest
of the row is stored. This is important to know.
 
If we add a clustered index (ci) the table is no longer a heap. It's is know
ordered by the ci. So there is a physical aspect to the table.
Now, we may defrag, resort, reorg or compact the table - that is one thing
that a ci is the tool for.
 
A ci looks like this - it contains the actual data from the column(s) that
is covered by the ci in it's indexpages - but the leafnode is the entire
row. There is no pointer, because none is needed, you're already 'home'. 
This also makes a 'ci scan' the same as a 'table scan'.
Since the leafnode is also the already existing data, a ci takes about 90%
less space as an index, compared to a nci.
 
Another aspect is that a ci must always be unique. It's not the same as we
can only place one on unique data, we are allowed to place a ci on
non-unique data, but... when we do that, an extra uniqifier is created for
us, so that internally, the data becomes unique. Thus, a ci is always
unique. This is good to know, because if data has to be 'uniquefied', it
requires extra space, and the ci then becomes a little wider than if it was
created on unique data. 

Now, suppose we've done the above, what about the nci's we started out with?

Well, stuff has happened since the ci came into play.
 
nci's *not* in a heap looks like this - the actual data from the
column(s) that is covered by the nci, but the pointer that earlier pointed
to the datapage where the rest of the row was stored, is now replaced by the
*entire ci* - sort of 'key in key'. This is also important to know. 
 
This also tells us that the coice of ci in terms of how wide it is, also
affects all our nci's, since they will each and everyone inherit the ci into
themselves. In the end this transforms to space, and space transforms to
performance. 
 
Anyone confused yet? =;o)
Is any of this relevant, worth remembering or not?
I think it is. It helps us decide what to prioritize when balancing on the
edge of 'index-magics'.
There's no absolute rights or wrongs here, it's all a tradeoff between
'good' and 'bad' aspects of index-types, coverings, behaviours etc.
But, the more we know about the bits, the more educated guesses we can make,
and maybe more importantly, we can also provide our intentions and reasoning
behind those choices.
 
...anyways, that's how I understand it...
 
=;o)
/Kenneth 

//to here//

cheers

Michael M




Michael,

Thanks for the reply.  Let me expand a little here.

This table is the 99 million record data set that I imported the other day.
The data came with some sort of PK field but it was nothing I created and
had no meaning to me.  It still exists.  

After the import, I created a PKID field, integer, autoincrement etc.
Somehow I created a primary key on that new integer field which used a
clustered index.  I did not intend to.  I can in fact, if I so desire (and
can cause it to happen) delete this PKID field entirely and start over.  I
created it, it is not integral to my data in any way, it is just a method of
setting up a unique record identifier (PK) based on an autoincrement
integer.  

One of the problems that dogs me still is that I cannot do particular things
in SQL Server from the "wizards", things like run a view of any complexity.
They simply time out (after about 1 minute or so).  I can copy and paste the
identical SQL from the view into a "new query" in the same database and it
will run just fine, even if the query takes 30 minutes to run.
It will not time out.  I have yet to figure that one out.  The impact of
that issue however is that if I try to do things like delete a field in a
large table (for example), it takes so long that the operation times out,
again after about 1 minute.

So, I try to delete this clustered index - timeout.  Try to delete the PK
symbol from the field - timeout.  Try to create a new index - works fine.

So, I have a new non-clustered index on the PKID field, as well as the
clustered index which I cannot delete.  It seems redundant to have both, and
I certainly do not need a clustered index AFAIK.  I really do not understand
the clustered thing (sad, I know), I have read several explanations and I
just haven't yet wrapped my mind around it.

Anyway, other than the fact that I do not need the clustered index (none of
my other such huge tables have one, all are non-clustered) and it is taking
up space, it seems to not cause me any harm.  I was just looking to get rid
of it because AFAIK I do not need it (and it takes up room, or so I
thought).

John W. Colby
Colby Consulting
www.ColbyConsulting.com




More information about the dba-SQLServer mailing list