[dba-SQLServer] SQL Server 2005 - long count time

Michael Maddison michael at ddisolutions.com.au
Wed Jun 13 02:20:39 CDT 2007


Hi John,

Having a slow day so I decided to test your problem.
In my test scenario with 3mill rows the query select count(*) from
[testtable]
was minutes faster on a heap, which I wouldn't have expected...
It seemed to be the only query that was faster though.

There is a quick way to get the rowcount on a CI table though.  Just a
note of caution this method does not give 'guarenteed' results.

    SELECT rowcnt AS [RowCount]
    FROM sysindexes
    WHERE indid IN (1,0)
        AND OBJECTPROPERTY(id, 'IsUserTable') = 1
        AND OBJECT_NAME(id) = 'testtable' 


cheers


Michael M

Interesting but also confusing.  It sounds like a CI is always
preferable but apparently not or there would not be a "religious" aspect
to it.  But the author does not describe any downsides, or at least in
sufficient extent to allow me to discern it as such. 

If a nci is a separate object somewhere, with pointers to the real
object (which describes my understanding if the nci) then is it faster
to scan, looking for matches (because it is not a table scan)?  IOW, my
understanding of a table scan was that each row of data is stored (more
less) together on the disk.  Thus a "table scan" has to wade through 99
million reads of the entire record, whereas a NCI is a separate
structure and thus the data read while "scanning" the index is just the
data (an integer in this case) plus the leaf pointer (probably another
integer) - less data to read off the disk.

In this case, the data in an autonumber integer PK has no meaning, it is
nothing more than a pointer to the data, thus all of the arguments for
using a ci are moot.  Sorting by the autonumber integer PKID is
meaningless.  Even if the data was originally stored in some order (last
name / first name or
something) the integer PKID does not reflect that meaning in any way.

Defrag?  Compact?  Are we saying that without a ci the data store with
rows missing cannot be physically rearranged on the disk to fill in the
holes left buy the deleted records?  I find that difficult to believe
but what do I know?

So in the end, I am left with a slightly better understanding of the
issue but not enough to see whether I need one or not.  I suspect not.

Sigh.

John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael
Maddison
Sent: Tuesday, June 12, 2007 11:37 PM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] SQL Server 2005 - long count time

Hi John,

Heres a great explanation I saved from another list....

//starts here//
Just wanted to throw in some bits and pieces to the puzzle.. =;o)
 
First off, there's no absolutes here. Like most other stuff, it depends.
Though many times one or the other may be considered 'best', it's not
the same as 'is always so'. Just wanted to make that clear. 
 
Also, there's sometimes a religious aspect to the debate on indexes. Not
so much as other topics (eg nulls - to be or not to be), but sometimes
there is a hint of it. It doesn't matter, it's just different opinions.
Again, there is no absolute truth that applies to all, only those we
ourselves choose to like. 
 
So..  my point is that some stuff is 'recommended', that is true. But,
most important is that we try to make our own choices, because we know
our own situations best. To be able to choose wisely, we need to have
information.
So, it helps to know a bit about how this stuff works, what it does, and
sometimes what it doesn't... 
 
Consider a lone table, with no indexes at all. This is also known as a
'heap'.
Add a nonclustered index (nci), or several nci's. We still have a heap.
A 'heap' is basically (in my mind anyway) something that is unordered.
What a heap cannot do, is to order itself, or 'defrag' or 'compact'
itself, because there is no spoon (..err order)
 
nci's in a heap looks like this - the actual data from the column(s)
that is covered by the nci, and a pointer that points to the datapage
where the rest of the row is stored. This is important to know.
 
If we add a clustered index (ci) the table is no longer a heap. It's is
know ordered by the ci. So there is a physical aspect to the table.
Now, we may defrag, resort, reorg or compact the table - that is one
thing that a ci is the tool for.
 
A ci looks like this - it contains the actual data from the column(s)
that is covered by the ci in it's indexpages - but the leafnode is the
entire row. There is no pointer, because none is needed, you're already
'home'. 
This also makes a 'ci scan' the same as a 'table scan'.
Since the leafnode is also the already existing data, a ci takes about
90% less space as an index, compared to a nci.
 
Another aspect is that a ci must always be unique. It's not the same as
we can only place one on unique data, we are allowed to place a ci on
non-unique data, but... when we do that, an extra uniqifier is created
for us, so that internally, the data becomes unique. Thus, a ci is
always unique. This is good to know, because if data has to be
'uniquefied', it requires extra space, and the ci then becomes a little
wider than if it was created on unique data. 

Now, suppose we've done the above, what about the nci's we started out
with?

Well, stuff has happened since the ci came into play.
 
nci's *not* in a heap looks like this - the actual data from the
column(s) that is covered by the nci, but the pointer that earlier
pointed to the datapage where the rest of the row was stored, is now
replaced by the *entire ci* - sort of 'key in key'. This is also
important to know. 
 
This also tells us that the coice of ci in terms of how wide it is, also
affects all our nci's, since they will each and everyone inherit the ci
into themselves. In the end this transforms to space, and space
transforms to performance. 
 
Anyone confused yet? =;o)
Is any of this relevant, worth remembering or not?
I think it is. It helps us decide what to prioritize when balancing on
the edge of 'index-magics'.
There's no absolute rights or wrongs here, it's all a tradeoff between
'good' and 'bad' aspects of index-types, coverings, behaviours etc.
But, the more we know about the bits, the more educated guesses we can
make, and maybe more importantly, we can also provide our intentions and
reasoning behind those choices.
 
...anyways, that's how I understand it...
 
=;o)
/Kenneth 

//to here//

cheers

Michael M




Michael,

Thanks for the reply.  Let me expand a little here.

This table is the 99 million record data set that I imported the other
day.
The data came with some sort of PK field but it was nothing I created
and had no meaning to me.  It still exists.  

After the import, I created a PKID field, integer, autoincrement etc.
Somehow I created a primary key on that new integer field which used a
clustered index.  I did not intend to.  I can in fact, if I so desire
(and can cause it to happen) delete this PKID field entirely and start
over.  I created it, it is not integral to my data in any way, it is
just a method of setting up a unique record identifier (PK) based on an
autoincrement integer.  

One of the problems that dogs me still is that I cannot do particular
things in SQL Server from the "wizards", things like run a view of any
complexity.
They simply time out (after about 1 minute or so).  I can copy and paste
the identical SQL from the view into a "new query" in the same database
and it will run just fine, even if the query takes 30 minutes to run.
It will not time out.  I have yet to figure that one out.  The impact of
that issue however is that if I try to do things like delete a field in
a large table (for example), it takes so long that the operation times
out, again after about 1 minute.

So, I try to delete this clustered index - timeout.  Try to delete the
PK symbol from the field - timeout.  Try to create a new index - works
fine.

So, I have a new non-clustered index on the PKID field, as well as the
clustered index which I cannot delete.  It seems redundant to have both,
and I certainly do not need a clustered index AFAIK.  I really do not
understand the clustered thing (sad, I know), I have read several
explanations and I just haven't yet wrapped my mind around it.

Anyway, other than the fact that I do not need the clustered index (none
of my other such huge tables have one, all are non-clustered) and it is
taking up space, it seems to not cause me any harm.  I was just looking
to get rid of it because AFAIK I do not need it (and it takes up room,
or so I thought).

John W. Colby
Colby Consulting
www.ColbyConsulting.com

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com





More information about the dba-SQLServer mailing list