[dba-SQLServer] SPAM-LOW: Re: How to construct queries

jwcolby jwcolby at colbyconsulting.com
Sun Dec 13 03:08:06 CST 2009


I think I will give this a try (soon).  I could set up code to programmatically build an index on 
every field, then set it running.  Come back in a week and I would be done indexing forever.  If it 
made that much time difference it would definitely be worth doing!

I had always worried about whether SQL Server could have enough indexes on a single table to do this 
(I have ~550 fields) but I read just yesterday that it can have that many indexes so I am good there..

John W. Colby
www.ColbyConsulting.com


Asger Blond wrote:
> John
> I just reread my previous posting and - stupid me - I totally forgot that
> you have OR operators in your queries...
> When you have OR operators the query optimizer will never use a composite
> index for searches no matter how selective the where condition is - it will
> only use the composite index for fetches in an index scan. Why? The query
> optimizer could use the condition referencing the first field defined in the
> index for a search operation, but then it has to do an index scan anyhow for
> the rest of the condition referencing the other fields - and of cause it
> decides to just make an index scan.
> In this situation you could as well use an index with includes columns. 
> 
> If you really want the optimizer to use index *searches* then you have to
> build *a separate index for each field*. The optimizer can use these indexes
> for searches and then join the results. This kind of execution is also
> "covering" since SQL Server can find all values needed in the indexes and
> don't have to go to the actual data pages. But in this case it certainly
> matters how selective your condition is.
> I just made a test on a table with 4 fields and 1 million records having
> lots of duplicate values which gives a poor selectivity. If the where clause
> only contained two conditions with an OR then the optimizer used a join of
> two seeks. But if it contained tree or more conditions the optimizer
> reverted to a table scan.
> I then changed the values to have much less duplicates (giving better
> selectivity) and now the optimizer would use several index seeks and join
> the results.
> Finally I tested performance for this set of records (having less
> duplicates) on 1) separate indexes on each field and 2) a composite index of
> all fields. There was a substantial difference: 1) had a much faster
> execution time than 2): 35 ms versus 1193 ms...
> If you want to test this on your own tables you can use a SSMS-query window
> and use Query | Include client statistics. This will give a result tab with
> statistical information after running a query - look especially for the
> value of "Total execution time".
> 
> Asger 
> -----Oprindelig meddelelse-----
> Fra: dba-sqlserver-bounces at databaseadvisors.com
> [mailto:dba-sqlserver-bounces at databaseadvisors.com] På vegne af jwcolby
> Sendt: 12. december 2009 18:37
> Til: Discussion concerning MS SQL Server
> Emne: Re: [dba-SQLServer] SPAM-LOW: Re: How to construct queries
> 
> Asger,
> 
>  >But if your table gets repopulated with new records
> 
> At least in this case the table is entirely static, never updated in any
> way.
> 
> John W. Colby
> www.ColbyConsulting.com
> 
> 
> Asger Blond wrote:
>>> ..the where clause does in fact reference (use) the first field in both
>>> indexes (it uses every field in both indexes).
>>> Does this mean that these indexes are then useful for the search?
>> Yes in principle. But as said SQL Server will evaluate if using the index
>> for its search (traversing the index-tree) or using the index for a scan
>> (running through the leaf level of the index) is the most efficient. If
> your
>> where-condition has a low selectivity (high amount of matching values in
> the
>> field) then an index scan is more efficient than an index seek. That's the
>> beauty of SQL as opposed to procedural languages: in SQL you don't tell
> how
>> to do the job - you just tell what you want and then the query optimizer
>> will figure out the best way.
>>
>> Your execution plan shows two parallel index scan and a join. This means
>> that the query optimizer is in fact using your indexes as covering indexes
>> (if it was not then you would see either table scans or lookup operations
> in
>> the plan). But it also tell that the optimizer do not use the indexes for
>> its search operations (in that case you would see index seeks).
>> If you constantly notice that the execution plan is using index scans
> rather
>> than index seeks then you might consider using indexes with included
> columns
>> instead of composite indexes. But if your table gets repopulated with new
>> records having less amount of matching values for the search field then
> you
>> might notice that the optimizer now will find your composite index useful
>> for its searches. In that case I would stick to the composite index.
>>
>> Asger
>> -----Oprindelig meddelelse-----
>> Fra: dba-sqlserver-bounces at databaseadvisors.com
>> [mailto:dba-sqlserver-bounces at databaseadvisors.com] På vegne af jwcolby
>> Sendt: 12. december 2009 03:26
>> Til: Discussion concerning MS SQL Server
>> Emne: Re: [dba-SQLServer] SPAM-LOW: Re: How to construct queries
>>
>> Wow Asger.  That is a lot of information and ...
>>
>> All of my indexes were created with the index wizard and as far as I can
>> tell they are composite 
>> indexes.  In fact I "script index as create" and they are in fact
> composite
>> indexes.  Having seen 
>> your code and my scripts I can see how I would create "included columns"
>> indexes but UI don't see 
>> how that would be done with the wizard.
>>
>> What I think I am hearing is that none of these indexes are useful for
> where
>> clauses with multiple 
>> fields in the where (virtually all of my queries), but it is useful for
>> returning the data (which I 
>> do in the ORDERS, though I do not need to in the COUNTS.
>>
>> In fact, rereading this a bit...
>>
>>  >SQL Server can only use the index for a *search* operation if your
>> where-condition references a 
>> field which is the *first* field defined in the index.
>>
>> Are you saying that the where clause simply has to include the first field
>> in the index somewhere in 
>> the where?  Because in fact I custom create these indexes. and that is the
>> case.
>>
>> Except, in this last count I had more than 16 fields in the where (a bunch
>> of ORs) and so I had to 
>> break the index into two parts, but again the where clause does in fact
>> reference (use) the first 
>> field in both indexes (it uses every field in both indexes).
>>
>> Does this mean that these indexes are then useful for the search?
>>
>> BTW I looked at the execution plan and it shows two parallel index scans
>> each accounting for about 
>> 5% of the cost, followed by (in each stream) logical operation repartition
>> stream (11% in each 
>> stream, and immediately after that an "inner join" accounting for 64% of
> the
>> cost.
>>
>> Thus it appears that it is in fact using the cover indexes maybe?
>>
>> Pretty complex stuff though (for me).
>>
>> Thanks again though for that explanation, it is useful.
>>
>> John W. Colby
>> www.ColbyConsulting.com
>>
>>
>> Asger Blond wrote:
>>> John,
>>> It will be useful for you to explore the execution plans for your
> queries.
>>> You do so by highlighting the query in SSMS and pressing CTRL+L.
>>> The Execution Plan window may be overwhelming but you just have to focus
>> on
>>> the rightmost ikon which is indicating either "Table scan" or "Index
> Scan"
>>> or "Index seek".
>>> Now for your question:
>>> To understand what's going on you have to distinguish *fetch operation*
>> and
>>> *search operation*. You also have to distinguish a *composite index* and
>> an
>>> *index with included columns*.
>>> A covering index means that SQL Server can *fetch* the records you are
>>> selecting in your query from the leaf level (=bottom level) of your
>>> index-tree, not having to make a lookup to the data pages.
>>> But a covering index doesn't mean that SQL Server can use your index for
>> its
>>> *search* operations. SQL Server can only use the index for a *search*
>>> operation if your where-condition references a field which is the *first*
>>> field defined in the index.
>>> And to make things more complicated: A covering index can be accomplished
>> in
>>> to ways: as a *composite index* or as an *index with included columns*.
>>> A composite index is created this way:
>>> CREATE INDEX MyCompositeIndex ON MyTable (FieldA, FieldB, FieldC, FieldD)
>>> An index with included columns is created this way:
>>> CREATE INDEX MyIncludedIndex ON MyTable (FieldA) INCLUDE (FieldB, FieldC,
>>> FieldD)
>>> A *composite index* records all specified columns in every level of the
>>> index-tree, and it is very useful for multi-field *sorting* operations.
>> But
>>> it is not useful for multi-field *search" operations, because SQL Server
>>> will only consider using this index for a query referencing the *first*
>>> field (here FieldA) in the where-condition. It is however very useful for
>>> *fetch* operations, since SQL Server then can do with an "index scan" on
>> the
>>> leaf level of the index not having to make a lookup the actual
> data-pages.
>>> This is only true however if you are selecting fields which are present
> in
>>> the index: if your query selects fields not defined in an index then SQL
>>> Server has to use a *table scan* which is not efficient. And now you may
>>> ask: What about a SELECT COUNT(*) - isn't this referencing all fields and
>>> shouldn't this per definition prevent SQL Server from using the index?
>> Don't
>>> worry: COUNT(*) is a special case where SQL Server will use the index if
>> it
>>> can make the execution more efficient.
>>> An *index with included columns* only records the included columns in the
>>> leaf level of the index. For this reason it is neither useful for
>>> multi-field *sorting* operations nor for *search* operations on the
>> included
>>> columns. But as for *composite indexes* it is very useful for *fetch*
>>> operations.
>>> Bottom line:
>>> A *composite index* is good for multi-columns sorting and covering
>> queries,
>>> but it doesn't help *search* operations and it imposes an overhead
> forcing
>>> SQL Server to record all fields in every level of the index-tree. An
>> *index
>>> with included column* it good for covering queries too, but it doesn't
>> help
>>> multi-columns sorting and it doesn't help *search* operations either -
>>> however it imposes fare less overhead cause the included fields are only
>>> recorded in the leaf-level of the index.
>>>
>>> I suggest you play whith the different options using CTRL+L - pretty soon
>>> you will notice this:
>>>
>>> Composite index on FieldA, FieldB, FieldC
>>> Composite index on FieldD, FieldE, FieldF
>>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" --> Index Seek
>>> SELECT COUNT (*) FROM MyTable WHERE FieldB="Y" --> Index Scan
>>> SELECT COUNT (*) FROM MyTable WHERE FieldD="Y" --> Index Seek
>>> SELECT COUNT (*) FROM MyTable WHERE FieldE="Y" --> Index Scan
>>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldB="Y" OR
> FieldC="Y"
>>> --> Index Scan
>>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldD="Y" --> Perhaps
>> two
>>> index seeks merged
>>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldE="Y" --> Probably
>> a
>>> table scan
>>>
>>> Index on FieldA including FieldB and FieldC
>>> Index on FieldA including FieldD and FieldF
>>> --> exactly same results
>>>
>>> Another option is to use a separate index on each field. Then you will
>>> notice:
>>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" --> Index Seek
>>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldB="Y" --> Perhaps
>> two
>>> index seeks merged
>>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldB="Y" OR
> FieldC="Y"
>>> --> Probably a table scan
>>>
>>> Mind that the results will depend on the actual values present in your
>>> table. SQL Server maintains statistics of the indexed columns and if
> there
>>> is a low selectivity for you query (= a high rate of matching values in
>> the
>>> indexed search-field) then chance is that SQL Server don't want to use
> the
>>> index and will prefer using a table scan instead.
>>>
>>>  Asger
>> _______________________________________________
>> dba-SQLServer mailing list
>> dba-SQLServer at databaseadvisors.com
>> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>> http://www.databaseadvisors.com
>>
>>
>> _______________________________________________
>> dba-SQLServer mailing list
>> dba-SQLServer at databaseadvisors.com
>> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>> http://www.databaseadvisors.com
>>
>>
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
> 
> 
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
> 
> 



More information about the dba-SQLServer mailing list