[dba-SQLServer] SPAM-LOW: Re: How to construct queries

Asger Blond ab-mi at post3.tele.dk
Sat Dec 12 13:36:25 CST 2009


John
I just reread my previous posting and - stupid me - I totally forgot that
you have OR operators in your queries...
When you have OR operators the query optimizer will never use a composite
index for searches no matter how selective the where condition is - it will
only use the composite index for fetches in an index scan. Why? The query
optimizer could use the condition referencing the first field defined in the
index for a search operation, but then it has to do an index scan anyhow for
the rest of the condition referencing the other fields - and of cause it
decides to just make an index scan.
In this situation you could as well use an index with includes columns. 

If you really want the optimizer to use index *searches* then you have to
build *a separate index for each field*. The optimizer can use these indexes
for searches and then join the results. This kind of execution is also
"covering" since SQL Server can find all values needed in the indexes and
don't have to go to the actual data pages. But in this case it certainly
matters how selective your condition is.
I just made a test on a table with 4 fields and 1 million records having
lots of duplicate values which gives a poor selectivity. If the where clause
only contained two conditions with an OR then the optimizer used a join of
two seeks. But if it contained tree or more conditions the optimizer
reverted to a table scan.
I then changed the values to have much less duplicates (giving better
selectivity) and now the optimizer would use several index seeks and join
the results.
Finally I tested performance for this set of records (having less
duplicates) on 1) separate indexes on each field and 2) a composite index of
all fields. There was a substantial difference: 1) had a much faster
execution time than 2): 35 ms versus 1193 ms...
If you want to test this on your own tables you can use a SSMS-query window
and use Query | Include client statistics. This will give a result tab with
statistical information after running a query - look especially for the
value of "Total execution time".

Asger 
-----Oprindelig meddelelse-----
Fra: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] På vegne af jwcolby
Sendt: 12. december 2009 18:37
Til: Discussion concerning MS SQL Server
Emne: Re: [dba-SQLServer] SPAM-LOW: Re: How to construct queries

Asger,

 >But if your table gets repopulated with new records

At least in this case the table is entirely static, never updated in any
way.

John W. Colby
www.ColbyConsulting.com


Asger Blond wrote:
>> ..the where clause does in fact reference (use) the first field in both
>> indexes (it uses every field in both indexes).
>> Does this mean that these indexes are then useful for the search?
> 
> Yes in principle. But as said SQL Server will evaluate if using the index
> for its search (traversing the index-tree) or using the index for a scan
> (running through the leaf level of the index) is the most efficient. If
your
> where-condition has a low selectivity (high amount of matching values in
the
> field) then an index scan is more efficient than an index seek. That's the
> beauty of SQL as opposed to procedural languages: in SQL you don't tell
how
> to do the job - you just tell what you want and then the query optimizer
> will figure out the best way.
> 
> Your execution plan shows two parallel index scan and a join. This means
> that the query optimizer is in fact using your indexes as covering indexes
> (if it was not then you would see either table scans or lookup operations
in
> the plan). But it also tell that the optimizer do not use the indexes for
> its search operations (in that case you would see index seeks).
> If you constantly notice that the execution plan is using index scans
rather
> than index seeks then you might consider using indexes with included
columns
> instead of composite indexes. But if your table gets repopulated with new
> records having less amount of matching values for the search field then
you
> might notice that the optimizer now will find your composite index useful
> for its searches. In that case I would stick to the composite index.
> 
> Asger
> -----Oprindelig meddelelse-----
> Fra: dba-sqlserver-bounces at databaseadvisors.com
> [mailto:dba-sqlserver-bounces at databaseadvisors.com] På vegne af jwcolby
> Sendt: 12. december 2009 03:26
> Til: Discussion concerning MS SQL Server
> Emne: Re: [dba-SQLServer] SPAM-LOW: Re: How to construct queries
> 
> Wow Asger.  That is a lot of information and ...
> 
> All of my indexes were created with the index wizard and as far as I can
> tell they are composite 
> indexes.  In fact I "script index as create" and they are in fact
composite
> indexes.  Having seen 
> your code and my scripts I can see how I would create "included columns"
> indexes but UI don't see 
> how that would be done with the wizard.
> 
> What I think I am hearing is that none of these indexes are useful for
where
> clauses with multiple 
> fields in the where (virtually all of my queries), but it is useful for
> returning the data (which I 
> do in the ORDERS, though I do not need to in the COUNTS.
> 
> In fact, rereading this a bit...
> 
>  >SQL Server can only use the index for a *search* operation if your
> where-condition references a 
> field which is the *first* field defined in the index.
> 
> Are you saying that the where clause simply has to include the first field
> in the index somewhere in 
> the where?  Because in fact I custom create these indexes. and that is the
> case.
> 
> Except, in this last count I had more than 16 fields in the where (a bunch
> of ORs) and so I had to 
> break the index into two parts, but again the where clause does in fact
> reference (use) the first 
> field in both indexes (it uses every field in both indexes).
> 
> Does this mean that these indexes are then useful for the search?
> 
> BTW I looked at the execution plan and it shows two parallel index scans
> each accounting for about 
> 5% of the cost, followed by (in each stream) logical operation repartition
> stream (11% in each 
> stream, and immediately after that an "inner join" accounting for 64% of
the
> cost.
> 
> Thus it appears that it is in fact using the cover indexes maybe?
> 
> Pretty complex stuff though (for me).
> 
> Thanks again though for that explanation, it is useful.
> 
> John W. Colby
> www.ColbyConsulting.com
> 
> 
> Asger Blond wrote:
>> John,
>> It will be useful for you to explore the execution plans for your
queries.
>> You do so by highlighting the query in SSMS and pressing CTRL+L.
>> The Execution Plan window may be overwhelming but you just have to focus
> on
>> the rightmost ikon which is indicating either "Table scan" or "Index
Scan"
>> or "Index seek".
>> Now for your question:
>> To understand what's going on you have to distinguish *fetch operation*
> and
>> *search operation*. You also have to distinguish a *composite index* and
> an
>> *index with included columns*.
>> A covering index means that SQL Server can *fetch* the records you are
>> selecting in your query from the leaf level (=bottom level) of your
>> index-tree, not having to make a lookup to the data pages.
>> But a covering index doesn't mean that SQL Server can use your index for
> its
>> *search* operations. SQL Server can only use the index for a *search*
>> operation if your where-condition references a field which is the *first*
>> field defined in the index.
>> And to make things more complicated: A covering index can be accomplished
> in
>> to ways: as a *composite index* or as an *index with included columns*.
>> A composite index is created this way:
>> CREATE INDEX MyCompositeIndex ON MyTable (FieldA, FieldB, FieldC, FieldD)
>> An index with included columns is created this way:
>> CREATE INDEX MyIncludedIndex ON MyTable (FieldA) INCLUDE (FieldB, FieldC,
>> FieldD)
>> A *composite index* records all specified columns in every level of the
>> index-tree, and it is very useful for multi-field *sorting* operations.
> But
>> it is not useful for multi-field *search" operations, because SQL Server
>> will only consider using this index for a query referencing the *first*
>> field (here FieldA) in the where-condition. It is however very useful for
>> *fetch* operations, since SQL Server then can do with an "index scan" on
> the
>> leaf level of the index not having to make a lookup the actual
data-pages.
>> This is only true however if you are selecting fields which are present
in
>> the index: if your query selects fields not defined in an index then SQL
>> Server has to use a *table scan* which is not efficient. And now you may
>> ask: What about a SELECT COUNT(*) - isn't this referencing all fields and
>> shouldn't this per definition prevent SQL Server from using the index?
> Don't
>> worry: COUNT(*) is a special case where SQL Server will use the index if
> it
>> can make the execution more efficient.
>> An *index with included columns* only records the included columns in the
>> leaf level of the index. For this reason it is neither useful for
>> multi-field *sorting* operations nor for *search* operations on the
> included
>> columns. But as for *composite indexes* it is very useful for *fetch*
>> operations.
>> Bottom line:
>> A *composite index* is good for multi-columns sorting and covering
> queries,
>> but it doesn't help *search* operations and it imposes an overhead
forcing
>> SQL Server to record all fields in every level of the index-tree. An
> *index
>> with included column* it good for covering queries too, but it doesn't
> help
>> multi-columns sorting and it doesn't help *search* operations either -
>> however it imposes fare less overhead cause the included fields are only
>> recorded in the leaf-level of the index.
>>
>> I suggest you play whith the different options using CTRL+L - pretty soon
>> you will notice this:
>>
>> Composite index on FieldA, FieldB, FieldC
>> Composite index on FieldD, FieldE, FieldF
>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" --> Index Seek
>> SELECT COUNT (*) FROM MyTable WHERE FieldB="Y" --> Index Scan
>> SELECT COUNT (*) FROM MyTable WHERE FieldD="Y" --> Index Seek
>> SELECT COUNT (*) FROM MyTable WHERE FieldE="Y" --> Index Scan
>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldB="Y" OR
FieldC="Y"
>> --> Index Scan
>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldD="Y" --> Perhaps
> two
>> index seeks merged
>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldE="Y" --> Probably
> a
>> table scan
>>
>> Index on FieldA including FieldB and FieldC
>> Index on FieldA including FieldD and FieldF
>> --> exactly same results
>>
>> Another option is to use a separate index on each field. Then you will
>> notice:
>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" --> Index Seek
>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldB="Y" --> Perhaps
> two
>> index seeks merged
>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldB="Y" OR
FieldC="Y"
>> --> Probably a table scan
>>
>> Mind that the results will depend on the actual values present in your
>> table. SQL Server maintains statistics of the indexed columns and if
there
>> is a low selectivity for you query (= a high rate of matching values in
> the
>> indexed search-field) then chance is that SQL Server don't want to use
the
>> index and will prefer using a table scan instead.
>>
>>  Asger
> 
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
> 
> 
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
> 
> 
_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com





More information about the dba-SQLServer mailing list