[dba-SQLServer] Indexing on a small number of values in a column

Fri Dec 22 09:25:42 CST 2006

Arthur,

This would tend to suggest that in the system you and I were discussing,
placing all of the data values in one big table would be inefficient until
you got the number of "From Table" data up above 20.  You would have to do a
full table scan on hundreds of millions of values.  

Using the alternate method of breaking HSID down into individual tables
representing the data "groups" and only filling the tables with records
where that data was present would mean you would have many different source
tables, but each would have smallish numbers of records, anywhere from
hundreds of thousands to low millions (5 million).  Remember that many
fields have essentially "binary" values such as 'True' / 'False'.  Even
these small tables a table scan will be required on fields like these (if
they are even used in a filter) but at least you are scanning a table with a
small number of records.

John W. Colby
Colby Consulting
www.ColbyConsulting.com

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Stuart
McLachlan
Sent: Friday, December 15, 2006 12:47 AM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Indexing on a small number of values in a
column

On 14 Dec 2006 at 21:19, artful at rogers.com wrote:

> I have heard or read that creating an index on a column containing a 
> small number of values is pointless.
> 
> a) Given a column Sex with a constraint "M/F", is there any point 
> indexing this column? b) Given the same column defined as a bit, you 
> can't index it (afaik). c) Assuming that it is pointless to index a 
> column containng < n distinct values, what is the value of n? Assume 
> 100M rows in the table of interest?

See:
http://msdn2.microsoft.com/en-us/library/aa224773(sql.80).aspx
"My tests show that a table scan often starts to perform better than a
nonclustered index access when at least 10 percent of the rows are selected.
I also found that the optimizer switches from nonclustered index access to
table scan prematurely (i.e., when nonclustered index access still shows
better response time than the corresponding table scan). In many cases, the
optimizer forces a table scan for queries with result sets of approximately
5 percent, although the table scan becomes more efficient than index access
at selectivities of 8 to 10 percent."

also
http://www.sql-server-performance.com/am_indexing_low_sel_cols.asp
for some real life tests and ideas on how to measure the effectiveness of an
index.

-- 
Stuart

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com