jwcolby
jwcolby at colbyconsulting.com
Wed Oct 6 21:20:14 CDT 2010
OK, so I have two databases, each with a single table. BTW, these are two of my main tables, used
all of the time in orders.
HSID - otherwise known as the database from hell, 51157068 records, ~560 fields. HSIDAllAdults is a
database where up to three adult names were lifted out of fields in HSID and placed in a table with
a PK_HSID field pointing back to the HSID record from which the information came. Thus
HSIDAllAdults is child to HSID in a manner of speaking (has a FK back to the PKID from HSID).
HSIDAllAdults has about 67564677 million records, 27 fields.
Each table has a PKID which is autonumber and a unique clustered index on the PKID.
HSID is demographic information such as income bracket, ChildInAgeGroup_XXX, HasDog etc. and has a
handful of indexes on the most commonly used demographics fields.
HSIAllAdults has name / address / gender / position (in HSID) fields and has three indexes on it -
NameAddr, Hash and one other.
So...
I have a copy of each of these databases on rotating media. I then backed up the rotating media
file and restored on the SSD, so I have a copy of each database in two places. I do this BTW
because the SSD is a Raid 0 as well as SSD, and I am worried that if I do too much writing on the
SSD I will wear it out - as in hot spot updates due to index updates etc. Thus I will be doing
maintenance on the rotating media and just copying the resulting db out to SSD for every day use.
Anyway, this allows me to do A/B comparisons of common queries. For the purpose of this test /
email, I joined HSID to HSIDAllAdults on the FK in HSIDAllAdults, then did a count of the PK in
HSIDAllAdults Group By MOB (mail order buyer, one of the demographics fields in HSID).
So the SSD query looks like:
SELECT _DataHSID.dbo.tblHSID.Mail_Order_BUYER, COUNT(dbo.tblAllAdultNameAddr.PK) AS Cnt
FROM dbo.tblAllAdultNameAddr INNER JOIN
_DataHSID.dbo.tblHSID ON dbo.tblAllAdultNameAddr.PKHSID =
_DataHSID.dbo.tblHSID.PKID
GROUP BY _DataHSID.dbo.tblHSID.Mail_Order_BUYER
And runs in 30 seconds, producing the following results:
NULL 19702461
1 19422841
2 28439375
The rotating media query looks as follows:
SELECT _DataHSID_OffLine.dbo.tblHSID.Mail_Order_BUYER, COUNT(dbo.tblAllAdultNameAddr.PK) AS Cnt
FROM dbo.tblAllAdultNameAddr INNER JOIN
_DataHSID_OffLine.dbo.tblHSID ON dbo.tblAllAdultNameAddr.PKHSID =
_DataHSID_OffLine.dbo.tblHSID.PKID
GROUP BY _DataHSID_OffLine.dbo.tblHSID.Mail_Order_BUYER
And runs in 1:50, producing the following results:
NULL 19702461
1 19422841
2 28439375
The resulting count is identical (as expected), with rotating media taking almost 4 times as long to
complete as the SSD.
I will be storing these two queries in their respective databases (rotating / SSD) so that I can use
them to test again when I add the second physical CPU chip and additional memory.
BTW this was a pretty simple query as things go. A more normal query is to pull Name / address and
a ValidAddress field out of HSIDAllAdults, filter the ValidAddr using something like In('V','E'),
joining that to HSID and pulling out typically 4 to 6 fields from HSID to use in where clauses.
So I am typically joining two tables of 50 million and 65 million records and then filtering on 4-8
fields, then actually capturing the resulting names / addresses and writing these into an order
table. Depending on the criteria, I will pull anywhere from a 100 K or so up to 5 million or more
names into the order table. The order table is created on the fly in an order database created just
for that order.
The order table will be on rotating media. Or I might eventually go buy another SSD to use
specifically for building these order databases. Then if the SSD wears out I can just replace it
with another.
I currently spend a lot of time, hours at a time building these orders, running the queries, and
manipulating the results to get the final export file. I am hoping to radically reduce my time
twiddling my thumbs waiting for SQL Server.
--
John W. Colby
www.ColbyConsulting.com