jwcolby
jwcolby at colbyconsulting.com
Sun Dec 13 03:08:06 CST 2009
I think I will give this a try (soon). I could set up code to programmatically build an index on every field, then set it running. Come back in a week and I would be done indexing forever. If it made that much time difference it would definitely be worth doing! I had always worried about whether SQL Server could have enough indexes on a single table to do this (I have ~550 fields) but I read just yesterday that it can have that many indexes so I am good there.. John W. Colby www.ColbyConsulting.com Asger Blond wrote: > John > I just reread my previous posting and - stupid me - I totally forgot that > you have OR operators in your queries... > When you have OR operators the query optimizer will never use a composite > index for searches no matter how selective the where condition is - it will > only use the composite index for fetches in an index scan. Why? The query > optimizer could use the condition referencing the first field defined in the > index for a search operation, but then it has to do an index scan anyhow for > the rest of the condition referencing the other fields - and of cause it > decides to just make an index scan. > In this situation you could as well use an index with includes columns. > > If you really want the optimizer to use index *searches* then you have to > build *a separate index for each field*. The optimizer can use these indexes > for searches and then join the results. This kind of execution is also > "covering" since SQL Server can find all values needed in the indexes and > don't have to go to the actual data pages. But in this case it certainly > matters how selective your condition is. > I just made a test on a table with 4 fields and 1 million records having > lots of duplicate values which gives a poor selectivity. If the where clause > only contained two conditions with an OR then the optimizer used a join of > two seeks. But if it contained tree or more conditions the optimizer > reverted to a table scan. > I then changed the values to have much less duplicates (giving better > selectivity) and now the optimizer would use several index seeks and join > the results. > Finally I tested performance for this set of records (having less > duplicates) on 1) separate indexes on each field and 2) a composite index of > all fields. There was a substantial difference: 1) had a much faster > execution time than 2): 35 ms versus 1193 ms... > If you want to test this on your own tables you can use a SSMS-query window > and use Query | Include client statistics. This will give a result tab with > statistical information after running a query - look especially for the > value of "Total execution time". > > Asger > -----Oprindelig meddelelse----- > Fra: dba-sqlserver-bounces at databaseadvisors.com > [mailto:dba-sqlserver-bounces at databaseadvisors.com] På vegne af jwcolby > Sendt: 12. december 2009 18:37 > Til: Discussion concerning MS SQL Server > Emne: Re: [dba-SQLServer] SPAM-LOW: Re: How to construct queries > > Asger, > > >But if your table gets repopulated with new records > > At least in this case the table is entirely static, never updated in any > way. > > John W. Colby > www.ColbyConsulting.com > > > Asger Blond wrote: >>> ..the where clause does in fact reference (use) the first field in both >>> indexes (it uses every field in both indexes). >>> Does this mean that these indexes are then useful for the search? >> Yes in principle. But as said SQL Server will evaluate if using the index >> for its search (traversing the index-tree) or using the index for a scan >> (running through the leaf level of the index) is the most efficient. If > your >> where-condition has a low selectivity (high amount of matching values in > the >> field) then an index scan is more efficient than an index seek. That's the >> beauty of SQL as opposed to procedural languages: in SQL you don't tell > how >> to do the job - you just tell what you want and then the query optimizer >> will figure out the best way. >> >> Your execution plan shows two parallel index scan and a join. This means >> that the query optimizer is in fact using your indexes as covering indexes >> (if it was not then you would see either table scans or lookup operations > in >> the plan). But it also tell that the optimizer do not use the indexes for >> its search operations (in that case you would see index seeks). >> If you constantly notice that the execution plan is using index scans > rather >> than index seeks then you might consider using indexes with included > columns >> instead of composite indexes. But if your table gets repopulated with new >> records having less amount of matching values for the search field then > you >> might notice that the optimizer now will find your composite index useful >> for its searches. In that case I would stick to the composite index. >> >> Asger >> -----Oprindelig meddelelse----- >> Fra: dba-sqlserver-bounces at databaseadvisors.com >> [mailto:dba-sqlserver-bounces at databaseadvisors.com] På vegne af jwcolby >> Sendt: 12. december 2009 03:26 >> Til: Discussion concerning MS SQL Server >> Emne: Re: [dba-SQLServer] SPAM-LOW: Re: How to construct queries >> >> Wow Asger. That is a lot of information and ... >> >> All of my indexes were created with the index wizard and as far as I can >> tell they are composite >> indexes. In fact I "script index as create" and they are in fact > composite >> indexes. Having seen >> your code and my scripts I can see how I would create "included columns" >> indexes but UI don't see >> how that would be done with the wizard. >> >> What I think I am hearing is that none of these indexes are useful for > where >> clauses with multiple >> fields in the where (virtually all of my queries), but it is useful for >> returning the data (which I >> do in the ORDERS, though I do not need to in the COUNTS. >> >> In fact, rereading this a bit... >> >> >SQL Server can only use the index for a *search* operation if your >> where-condition references a >> field which is the *first* field defined in the index. >> >> Are you saying that the where clause simply has to include the first field >> in the index somewhere in >> the where? Because in fact I custom create these indexes. and that is the >> case. >> >> Except, in this last count I had more than 16 fields in the where (a bunch >> of ORs) and so I had to >> break the index into two parts, but again the where clause does in fact >> reference (use) the first >> field in both indexes (it uses every field in both indexes). >> >> Does this mean that these indexes are then useful for the search? >> >> BTW I looked at the execution plan and it shows two parallel index scans >> each accounting for about >> 5% of the cost, followed by (in each stream) logical operation repartition >> stream (11% in each >> stream, and immediately after that an "inner join" accounting for 64% of > the >> cost. >> >> Thus it appears that it is in fact using the cover indexes maybe? >> >> Pretty complex stuff though (for me). >> >> Thanks again though for that explanation, it is useful. >> >> John W. Colby >> www.ColbyConsulting.com >> >> >> Asger Blond wrote: >>> John, >>> It will be useful for you to explore the execution plans for your > queries. >>> You do so by highlighting the query in SSMS and pressing CTRL+L. >>> The Execution Plan window may be overwhelming but you just have to focus >> on >>> the rightmost ikon which is indicating either "Table scan" or "Index > Scan" >>> or "Index seek". >>> Now for your question: >>> To understand what's going on you have to distinguish *fetch operation* >> and >>> *search operation*. You also have to distinguish a *composite index* and >> an >>> *index with included columns*. >>> A covering index means that SQL Server can *fetch* the records you are >>> selecting in your query from the leaf level (=bottom level) of your >>> index-tree, not having to make a lookup to the data pages. >>> But a covering index doesn't mean that SQL Server can use your index for >> its >>> *search* operations. SQL Server can only use the index for a *search* >>> operation if your where-condition references a field which is the *first* >>> field defined in the index. >>> And to make things more complicated: A covering index can be accomplished >> in >>> to ways: as a *composite index* or as an *index with included columns*. >>> A composite index is created this way: >>> CREATE INDEX MyCompositeIndex ON MyTable (FieldA, FieldB, FieldC, FieldD) >>> An index with included columns is created this way: >>> CREATE INDEX MyIncludedIndex ON MyTable (FieldA) INCLUDE (FieldB, FieldC, >>> FieldD) >>> A *composite index* records all specified columns in every level of the >>> index-tree, and it is very useful for multi-field *sorting* operations. >> But >>> it is not useful for multi-field *search" operations, because SQL Server >>> will only consider using this index for a query referencing the *first* >>> field (here FieldA) in the where-condition. It is however very useful for >>> *fetch* operations, since SQL Server then can do with an "index scan" on >> the >>> leaf level of the index not having to make a lookup the actual > data-pages. >>> This is only true however if you are selecting fields which are present > in >>> the index: if your query selects fields not defined in an index then SQL >>> Server has to use a *table scan* which is not efficient. And now you may >>> ask: What about a SELECT COUNT(*) - isn't this referencing all fields and >>> shouldn't this per definition prevent SQL Server from using the index? >> Don't >>> worry: COUNT(*) is a special case where SQL Server will use the index if >> it >>> can make the execution more efficient. >>> An *index with included columns* only records the included columns in the >>> leaf level of the index. For this reason it is neither useful for >>> multi-field *sorting* operations nor for *search* operations on the >> included >>> columns. But as for *composite indexes* it is very useful for *fetch* >>> operations. >>> Bottom line: >>> A *composite index* is good for multi-columns sorting and covering >> queries, >>> but it doesn't help *search* operations and it imposes an overhead > forcing >>> SQL Server to record all fields in every level of the index-tree. An >> *index >>> with included column* it good for covering queries too, but it doesn't >> help >>> multi-columns sorting and it doesn't help *search* operations either - >>> however it imposes fare less overhead cause the included fields are only >>> recorded in the leaf-level of the index. >>> >>> I suggest you play whith the different options using CTRL+L - pretty soon >>> you will notice this: >>> >>> Composite index on FieldA, FieldB, FieldC >>> Composite index on FieldD, FieldE, FieldF >>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" --> Index Seek >>> SELECT COUNT (*) FROM MyTable WHERE FieldB="Y" --> Index Scan >>> SELECT COUNT (*) FROM MyTable WHERE FieldD="Y" --> Index Seek >>> SELECT COUNT (*) FROM MyTable WHERE FieldE="Y" --> Index Scan >>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldB="Y" OR > FieldC="Y" >>> --> Index Scan >>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldD="Y" --> Perhaps >> two >>> index seeks merged >>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldE="Y" --> Probably >> a >>> table scan >>> >>> Index on FieldA including FieldB and FieldC >>> Index on FieldA including FieldD and FieldF >>> --> exactly same results >>> >>> Another option is to use a separate index on each field. Then you will >>> notice: >>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" --> Index Seek >>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldB="Y" --> Perhaps >> two >>> index seeks merged >>> SELECT COUNT (*) FROM MyTable WHERE FieldA="Y" OR FieldB="Y" OR > FieldC="Y" >>> --> Probably a table scan >>> >>> Mind that the results will depend on the actual values present in your >>> table. SQL Server maintains statistics of the indexed columns and if > there >>> is a low selectivity for you query (= a high rate of matching values in >> the >>> indexed search-field) then chance is that SQL Server don't want to use > the >>> index and will prefer using a table scan instead. >>> >>> Asger >> _______________________________________________ >> dba-SQLServer mailing list >> dba-SQLServer at databaseadvisors.com >> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >> http://www.databaseadvisors.com >> >> >> _______________________________________________ >> dba-SQLServer mailing list >> dba-SQLServer at databaseadvisors.com >> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >> http://www.databaseadvisors.com >> >> > _______________________________________________ > dba-SQLServer mailing list > dba-SQLServer at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-sqlserver > http://www.databaseadvisors.com > > > _______________________________________________ > dba-SQLServer mailing list > dba-SQLServer at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-sqlserver > http://www.databaseadvisors.com > >