[dba-SQLServer] SQL Server 2005 - long count time

Michael Maddison michael at ddisolutions.com.au
Wed Jun 13 08:55:33 CDT 2007


' Now that I understand what the CI indexes do, I will probably go back
and add both a CI and a NCI to each table, on the PKID, in order to
allow SQL Server to decide which is more efficient in any given case and
use whichever it needs in each situation.'

>>>Also as inserts and updates aren't an issue makes sure that both
sides of any join fields have indexes as well.
Also any columns in the where clause.  

cheers

MM

Michael,

>In my test scenario with 3mill rows the query select count(*) from
[testtable] was minutes faster on a heap, which I wouldn't have
expected...
>It seemed to be the only query that was faster though.


Now take into account that it was minutes faster on a 3 million row
table, how much faster on a 90 million row table.

This leaves us to wonder if perhaps both would not be beneficial?  The
nci for doing counts (and I do a LOT of them!!!) and a clustered index
for actual joins, leaving it up to SQL Server as to which it uses.  The
thing to understand here is that ATM I have use a system where I have
two actual tables, a 90 million "original data" table and a matching N
million "address validated" table, which contains ONLY the address
fields, NOT any additional demographics fields.  The reason that the
address validated table is "N million" is that I delete non valid
addresses from that table in order to pare down the size.  So the
"address validated table" has the same PKID values as the "original
data" table.  

When I process data, I use the original "data table" in where clauses
because that table contains the demographics - age, income, children,
preferences.  The "address validated" table has address information, but
it also has information that the address is valid just in the PK alone.
In other words, the fact that it is in the valid address table in and of
itself means that the address is valid (because I deleted all non-valid
records from THAT table.

So my client calls me and says "give me a count of all the households
(another subject) in these zips".  I can do that without ever going back
to the original data table.  In other cases he says "give me a count of
all the addresses in these zips where age=x, income is between y and z
and has female children.  That query needs to go back to the original
data table for the demographics information.

So you can see that these PK indexes play an absolutely huge role in my
business, both from the perspective of joins between the two tables as
well as from the perspective of raw counts when demographics are not
needed.

Now, I can hear the protestations already:

1) Why not merge the valid address back in to the demographics table.  

Because this is but one of the demographics tables I deal with, there
will be literally dozens of them.  At some point in time the valid
address data from each demographics table has to be merged into a single
valid address table with an even more complex m-m table saying this
valid address is linked to THESE (multiple) demographics tables.

2) OK then why not delete the non-valid addresses from the demographics
table?

This is a "should be done" but in the end it makes only a small (but
still
significant) dent in the total number of records in the demographics
table, about 20% maximum (so far).  What I really need to do is go back
and remove ALL address data from the demographics table.  Basically,
someone DID merge all of the address validation data back in to the
demographics table (back before I got ahold of it) and so the
demographics table now has probably 40 (out of 700) fields that are old,
out of date address validation info.

Even if I do that, I STILL need the PKs in both tables in order to join
demographics with valid address, and I need them to be FAST.  Now that I
understand what the CI indexes do, I will probably go back and add both
a CI and a NCI to each table, on the PKID, in order to allow SQL Server
to decide which is more efficient in any given case and use whichever it
needs in each situation.

John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Michael
Maddison
Sent: Wednesday, June 13, 2007 3:21 AM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] SQL Server 2005 - long count time

Hi John,

Having a slow day so I decided to test your problem.
In my test scenario with 3mill rows the query select count(*) from
[testtable] was minutes faster on a heap, which I wouldn't have
expected...
It seemed to be the only query that was faster though.

There is a quick way to get the rowcount on a CI table though.  Just a
note of caution this method does not give 'guarenteed' results.

    SELECT rowcnt AS [RowCount]
    FROM sysindexes
    WHERE indid IN (1,0)
        AND OBJECTPROPERTY(id, 'IsUserTable') = 1
        AND OBJECT_NAME(id) = 'testtable' 


cheers


Michael M

Interesting but also confusing.  It sounds like a CI is always
preferable but apparently not or there would not be a "religious" aspect
to it.  But the author does not describe any downsides, or at least in
sufficient extent to allow me to discern it as such. 

If a nci is a separate object somewhere, with pointers to the real
object (which describes my understanding if the nci) then is it faster
to scan, looking for matches (because it is not a table scan)?  IOW, my
understanding of a table scan was that each row of data is stored (more
less) together on the disk.  Thus a "table scan" has to wade through 99
million reads of the entire record, whereas a NCI is a separate
structure and thus the data read while "scanning" the index is just the
data (an integer in this case) plus the leaf pointer (probably another
integer) - less data to read off the disk.

In this case, the data in an autonumber integer PK has no meaning, it is
nothing more than a pointer to the data, thus all of the arguments for
using a ci are moot.  Sorting by the autonumber integer PKID is
meaningless.  Even if the data was originally stored in some order (last
name / first name or
something) the integer PKID does not reflect that meaning in any way.

Defrag?  Compact?  Are we saying that without a ci the data store with
rows missing cannot be physically rearranged on the disk to fill in the
holes left buy the deleted records?  I find that difficult to believe
but what do I know?

So in the end, I am left with a slightly better understanding of the
issue but not enough to see whether I need one or not.  I suspect not.

Sigh.

John W. Colby
Colby Consulting
www.ColbyConsulting.com

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com





More information about the dba-SQLServer mailing list