[dba-SQLServer] SQL Server 2005 - long count time

Tue Jun 12 22:37:24 CDT 2007

Hi John,

Heres a great explanation I saved from another list....

//starts here//
Just wanted to throw in some bits and pieces to the puzzle.. =;o)

First off, there's no absolutes here. Like most other stuff, it depends.
Though many times one or the other may be considered 'best', it's not
the same as 'is always so'. Just wanted to make that clear. 

Also, there's sometimes a religious aspect to the debate on indexes. Not
so much as other topics (eg nulls - to be or not to be), but sometimes
there is a hint of it. It doesn't matter, it's just different opinions.
Again, there is no absolute truth that applies to all, only those we
ourselves choose to like. 

So..  my point is that some stuff is 'recommended', that is true. But,
most important is that we try to make our own choices, because we know
our own situations best. To be able to choose wisely, we need to have
information. So, it helps to know a bit about how this stuff works, what
it does, and sometimes what it doesn't... 

Consider a lone table, with no indexes at all. This is also known as a
'heap'.
Add a nonclustered index (nci), or several nci's. We still have a heap.
A 'heap' is basically (in my mind anyway) something that is unordered.
What a heap cannot do, is to order itself, or 'defrag' or 'compact'
itself, because there is no spoon (..err order)

nci's in a heap looks like this - the actual data from the column(s)
that is covered by the nci, and a pointer that points to the datapage
where the rest of the row is stored. This is important to know.

If we add a clustered index (ci) the table is no longer a heap. It's is
know ordered by the ci. So there is a physical aspect to the table.
Now, we may defrag, resort, reorg or compact the table - that is one
thing that a ci is the tool for.

A ci looks like this - it contains the actual data from the column(s)
that is covered by the ci in it's indexpages - but the leafnode is the
entire row. There is no pointer, because none is needed, you're already
'home'. 
This also makes a 'ci scan' the same as a 'table scan'.
Since the leafnode is also the already existing data, a ci takes about
90% less space as an index, compared to a nci.

Another aspect is that a ci must always be unique. It's not the same as
we can only place one on unique data, we are allowed to place a ci on
non-unique data, but... when we do that, an extra uniqifier is created
for us, so that internally, the data becomes unique. Thus, a ci is
always unique. This is good to know, because if data has to be
'uniquefied', it requires extra space, and the ci then becomes a little
wider than if it was created on unique data. 

Now, suppose we've done the above, what about the nci's we started out
with? 
Well, stuff has happened since the ci came into play.

nci's *not* in a heap looks like this - the actual data from the
column(s) that is covered by the nci, but the pointer that earlier
pointed to the datapage where the rest of the row was stored, is now
replaced by the *entire ci* - sort of 'key in key'. This is also
important to know. 

This also tells us that the coice of ci in terms of how wide it is, also
affects all our nci's, since they will each and everyone inherit the ci
into themselves. In the end this transforms to space, and space
transforms to performance. 

Anyone confused yet? =;o)
Is any of this relevant, worth remembering or not?
I think it is. It helps us decide what to prioritize when balancing on
the edge of 'index-magics'.
There's no absolute rights or wrongs here, it's all a tradeoff between
'good' and 'bad' aspects of index-types, coverings, behaviours etc.
But, the more we know about the bits, the more educated guesses we can
make, and maybe more importantly, we can also provide our intentions and
reasoning behind those choices.

...anyways, that's how I understand it...

=;o)
/Kenneth 

//to here//

cheers

Michael M

Michael,

Thanks for the reply.  Let me expand a little here.

This table is the 99 million record data set that I imported the other
day.
The data came with some sort of PK field but it was nothing I created
and had no meaning to me.  It still exists.  

After the import, I created a PKID field, integer, autoincrement etc.
Somehow I created a primary key on that new integer field which used a
clustered index.  I did not intend to.  I can in fact, if I so desire
(and can cause it to happen) delete this PKID field entirely and start
over.  I created it, it is not integral to my data in any way, it is
just a method of setting up a unique record identifier (PK) based on an
autoincrement integer.  

One of the problems that dogs me still is that I cannot do particular
things in SQL Server from the "wizards", things like run a view of any
complexity.
They simply time out (after about 1 minute or so).  I can copy and paste
the identical SQL from the view into a "new query" in the same database
and it will run just fine, even if the query takes 30 minutes to run.
It will not time out.  I have yet to figure that one out.  The impact of
that issue however is that if I try to do things like delete a field in
a large table (for example), it takes so long that the operation times
out, again after about 1 minute.

So, I try to delete this clustered index - timeout.  Try to delete the
PK symbol from the field - timeout.  Try to create a new index - works
fine.

So, I have a new non-clustered index on the PKID field, as well as the
clustered index which I cannot delete.  It seems redundant to have both,
and I certainly do not need a clustered index AFAIK.  I really do not
understand the clustered thing (sad, I know), I have read several
explanations and I just haven't yet wrapped my mind around it.

Anyway, other than the fact that I do not need the clustered index (none
of my other such huge tables have one, all are non-clustered) and it is
taking up space, it seems to not cause me any harm.  I was just looking
to get rid of it because AFAIK I do not need it (and it takes up room,
or so I thought).

John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael
Maddison
Sent: Tuesday, June 12, 2007 10:51 PM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] SQL Server 2005 - long count time

John,

I don't know why the heap index was quicker but I would seriously
hesitate to remove the clustered index.  
Maybe the rsult of the 1st query was cached when you ran the 2nd query?

As the physical data is stored in clustered index order (a clustered
index is basically the data) removing it turns the table into a heap,
literally.
This can have serious consequences for performance particularly on large
tables.  It can also cause fragmentation problems which again can have
performance issues.  AFAIK clustered indexes are preferred for all
tables with more then a dozen or so rows, of course knowing which
column/s to index is a grey area.
I recommend you research slow count queries before making such a radical
change.

cheers

Michael M

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com