[dba-SQLServer] Apples to oranges - take 2

jwcolby jwcolby at colbyconsulting.com
Wed Oct 6 21:20:14 CDT 2010


OK, so I have two databases, each with a single table.  BTW, these are two of my main tables, used 
all of the time in orders.

HSID - otherwise known as the database from hell, 51157068 records, ~560 fields.  HSIDAllAdults is a 
database where up to three adult names were lifted out of fields in HSID and placed in a table with 
a PK_HSID field pointing back to the HSID record from which the information came.  Thus 
HSIDAllAdults is child to HSID in a manner of speaking (has a FK back to the PKID from HSID). 
HSIDAllAdults has about 67564677 million records, 27 fields.

Each table has a PKID which is autonumber and a unique clustered index on the PKID.

HSID is demographic information such as income bracket, ChildInAgeGroup_XXX, HasDog etc. and has a 
handful of indexes on the most commonly used demographics fields.

HSIAllAdults has name / address / gender / position (in HSID) fields and has three indexes on it - 
NameAddr, Hash and one other.

So...

I have a copy of each of these databases on rotating media.  I then backed up the rotating media 
file and restored on the SSD, so I have a copy of each database in two places.  I do this BTW 
because the SSD is a Raid 0 as well as SSD, and I am worried that if I do too much writing on the 
SSD I will wear it out - as in hot spot updates due to index updates etc.  Thus I will be doing 
maintenance on the rotating media and just copying the resulting db out to SSD for every day use.

Anyway, this allows me to do A/B comparisons of common queries.  For the purpose of this test / 
email, I joined HSID to HSIDAllAdults on the FK in HSIDAllAdults, then did a count of the PK in 
HSIDAllAdults Group By MOB (mail order buyer, one of the demographics fields in HSID).

So the SSD query looks like:

SELECT     _DataHSID.dbo.tblHSID.Mail_Order_BUYER, COUNT(dbo.tblAllAdultNameAddr.PK) AS Cnt
FROM         dbo.tblAllAdultNameAddr INNER JOIN
                       _DataHSID.dbo.tblHSID ON dbo.tblAllAdultNameAddr.PKHSID = 
_DataHSID.dbo.tblHSID.PKID
GROUP BY _DataHSID.dbo.tblHSID.Mail_Order_BUYER

And runs in 30 seconds, producing the following results:

NULL	19702461
1	19422841
2	28439375

The rotating media query looks as follows:

SELECT     _DataHSID_OffLine.dbo.tblHSID.Mail_Order_BUYER, COUNT(dbo.tblAllAdultNameAddr.PK) AS Cnt
FROM         dbo.tblAllAdultNameAddr INNER JOIN
                       _DataHSID_OffLine.dbo.tblHSID ON dbo.tblAllAdultNameAddr.PKHSID = 
_DataHSID_OffLine.dbo.tblHSID.PKID
GROUP BY _DataHSID_OffLine.dbo.tblHSID.Mail_Order_BUYER

And runs in 1:50, producing the following results:

NULL	19702461
1	19422841
2	28439375

The resulting count is identical (as expected), with rotating media taking almost 4 times as long to 
complete as the SSD.

I will be storing these two queries in their respective databases (rotating / SSD) so that I can use 
them to test again when I add the second physical CPU chip and additional memory.

BTW this was a pretty simple query as things go.  A more normal query is to pull Name / address and 
a ValidAddress field out of HSIDAllAdults, filter the ValidAddr using something like In('V','E'), 
joining that to HSID and pulling out typically 4 to 6 fields from HSID to use in where clauses.

So I am typically joining two tables of 50 million and 65 million records and then filtering on 4-8 
fields, then actually capturing the resulting names / addresses and writing these into an order 
table.  Depending on the criteria, I will pull anywhere from a 100 K or so up to 5 million or more 
names into the order table.  The order table is created on the fly in an order database created just 
for that order.

The order table will be on rotating media.  Or I might eventually go buy another SSD to use 
specifically for building these order databases.  Then if the SSD wears out I can just replace it 
with another.

I currently spend a lot of time, hours at a time building these orders, running the queries, and 
manipulating the results to get the final export file.  I am hoping to radically reduce my time 
twiddling my thumbs waiting for SQL Server.

-- 
John W. Colby
www.ColbyConsulting.com



More information about the dba-SQLServer mailing list