[dba-SQLServer] Processing diverse where clauses

Jim Lawrence accessd at shaw.ca
Wed May 16 10:23:57 CDT 2007


Hi John:

The latest features of SQL 2005 that might help you are:

You can use a SQL query or nested query as if you are using a table. I am
not sure how many levels you can go but Oracle is 16. This could replace the
IN (1,2,3...)

Select o1.test1, o2.test2...
>From (Select * from table2
	Where ...) as o2
Where o1 = "test" 

(The processes from the inner most select out.)
 
By default MS SQL 2005 has Optimization statistics on. This means that the
first time you perform a possible BAD query it will run slow but eventually
the same query will start running faster. This is because 'hints' internally
tell the query routine which is the best method, whether fullscan or index
and which indexes to use. Oracle had needed to use manual 'hints' but MS SQL
automated this process since 2000. 

This means you can get away with bad queries, just do the same bad query
enough time and remember to use 'variables' in the SPs

Select o1.test1.... from table1 where o2.test = @test

And not...

Select o1.test1.... from table1 where o2.test = 'Filbert'

HTH
Jim 

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Wednesday, May 16, 2007 5:45 AM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Processing diverse where clauses

Don't you hate it when you lose a hard drive!!!  My new servers are being
built with Raid6! So that I can lose TWO drives and soldier on.  All of my
critical stuff is now being placed out on volumes on those raid arrays.  It
is probably not something I would do without a client need, but the total
cost is now under $1000.  $500 for the dedicate raid controller I selected:

http://www.newegg.com/Product/Product.aspx?Item=N82E16816131004

And at least 4 hard drives - two for the parity and two for the data.  This
card will support 8 drives.  In my first server I used the Seagate 320gb
(300 real gb) drives:

http://www.newegg.com/Product/Product.aspx?Item=N82E16822148140

In the second server I am using the 500gb drives:

http://www.newegg.com/Product/Product.aspx?Item=N82E16822148136

With the selected controller you can start with raid 5 and as you add drives
later, just tell the controller to upgrade to raid 6.  It "just works" and
works well.  And is blazing fast.

As for the IN() clause...

Interestingly, I ran a test of just the IN() clause part and I can pull
counts from 65 million records in just a couple of seconds - Count(PK) as
RecCnt from tblXXX where IncomeRange in('4','5'...).  I did index this field
since it had so many values.  I also ended up using a NOT IN() because there
were fewer values to enumerate.  I did not test whether that was faster or
not.

I ended up breaking the query down into three queries

IncomeRange	- 1 field in where using IN()
FemaleWithChildren - 8 fields in where
RecsInZips - 1 join on Zip5

And then built up a query joining the PKs pulled from all of those separate
queries.  The entire count using all three subqueries runs in 23 minutes.  I
think it is having to do table scans for the FemaleWithChildren subquery
simply because the fields only have a few values in them.

John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Arthur
Fuller
Sent: Wednesday, May 16, 2007 8:10 AM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Processing diverse where clauses

INs are bad policy, typically, unless you have a very small number of
targets. I wrote about this problem at TechRepublic a while ago. I may not
have the article handy, but you can probably find it there. Go to
www.techrepublic.com and search for "Arthur Fuller IN()". That might work.
Else just search for my name and then browse.

Anyway, the article concerns a better way to do the IN part. The problem
with IN is that it almost always forces a table-scan, which in your case
will give you enough time to move to another country and back before it's
done. The article shows how to construct a temp-table from the IN clause and
then do a join to the actual table using the temp table. Especially when
your IN clause could have lots of values, this approach is way faster.

I'll dig now for the article, but I'm fairly certain that it was one of the
things wiped out when I lost a hard disk a month ago or so. If I find it
I'll send it off-list.

Arthur


On 5/15/07, jwcolby <jwcolby at colbyconsulting.com> wrote:
>
> I am processing a query where a set of N fields have any value, AND 
> one specific field has an IN() clause, i.e. a ton of codes possible.
>
> Is it more efficient to build up a pair of queries, SELECT PKID from 
> tblX WHERE FldA in('A','B'...), and another query where the "OR" 
> fields are gathered?  Or just use one big query?  I hate to even press 
> the button...  I am having to test with a TOP 100 kind of thing as it 
> is.
>
> John W. Colby
> Colby Consulting
> www.ColbyConsulting.com
>
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
>
>
_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com




More information about the dba-SQLServer mailing list