jwcolby
jwcolby at colbyconsulting.com
Wed Oct 6 21:20:14 CDT 2010
OK, so I have two databases, each with a single table. BTW, these are two of my main tables, used all of the time in orders. HSID - otherwise known as the database from hell, 51157068 records, ~560 fields. HSIDAllAdults is a database where up to three adult names were lifted out of fields in HSID and placed in a table with a PK_HSID field pointing back to the HSID record from which the information came. Thus HSIDAllAdults is child to HSID in a manner of speaking (has a FK back to the PKID from HSID). HSIDAllAdults has about 67564677 million records, 27 fields. Each table has a PKID which is autonumber and a unique clustered index on the PKID. HSID is demographic information such as income bracket, ChildInAgeGroup_XXX, HasDog etc. and has a handful of indexes on the most commonly used demographics fields. HSIAllAdults has name / address / gender / position (in HSID) fields and has three indexes on it - NameAddr, Hash and one other. So... I have a copy of each of these databases on rotating media. I then backed up the rotating media file and restored on the SSD, so I have a copy of each database in two places. I do this BTW because the SSD is a Raid 0 as well as SSD, and I am worried that if I do too much writing on the SSD I will wear it out - as in hot spot updates due to index updates etc. Thus I will be doing maintenance on the rotating media and just copying the resulting db out to SSD for every day use. Anyway, this allows me to do A/B comparisons of common queries. For the purpose of this test / email, I joined HSID to HSIDAllAdults on the FK in HSIDAllAdults, then did a count of the PK in HSIDAllAdults Group By MOB (mail order buyer, one of the demographics fields in HSID). So the SSD query looks like: SELECT _DataHSID.dbo.tblHSID.Mail_Order_BUYER, COUNT(dbo.tblAllAdultNameAddr.PK) AS Cnt FROM dbo.tblAllAdultNameAddr INNER JOIN _DataHSID.dbo.tblHSID ON dbo.tblAllAdultNameAddr.PKHSID = _DataHSID.dbo.tblHSID.PKID GROUP BY _DataHSID.dbo.tblHSID.Mail_Order_BUYER And runs in 30 seconds, producing the following results: NULL 19702461 1 19422841 2 28439375 The rotating media query looks as follows: SELECT _DataHSID_OffLine.dbo.tblHSID.Mail_Order_BUYER, COUNT(dbo.tblAllAdultNameAddr.PK) AS Cnt FROM dbo.tblAllAdultNameAddr INNER JOIN _DataHSID_OffLine.dbo.tblHSID ON dbo.tblAllAdultNameAddr.PKHSID = _DataHSID_OffLine.dbo.tblHSID.PKID GROUP BY _DataHSID_OffLine.dbo.tblHSID.Mail_Order_BUYER And runs in 1:50, producing the following results: NULL 19702461 1 19422841 2 28439375 The resulting count is identical (as expected), with rotating media taking almost 4 times as long to complete as the SSD. I will be storing these two queries in their respective databases (rotating / SSD) so that I can use them to test again when I add the second physical CPU chip and additional memory. BTW this was a pretty simple query as things go. A more normal query is to pull Name / address and a ValidAddress field out of HSIDAllAdults, filter the ValidAddr using something like In('V','E'), joining that to HSID and pulling out typically 4 to 6 fields from HSID to use in where clauses. So I am typically joining two tables of 50 million and 65 million records and then filtering on 4-8 fields, then actually capturing the resulting names / addresses and writing these into an order table. Depending on the criteria, I will pull anywhere from a 100 K or so up to 5 million or more names into the order table. The order table is created on the fly in an order database created just for that order. The order table will be on rotating media. Or I might eventually go buy another SSD to use specifically for building these order databases. Then if the SSD wears out I can just replace it with another. I currently spend a lot of time, hours at a time building these orders, running the queries, and manipulating the results to get the final export file. I am hoping to radically reduce my time twiddling my thumbs waiting for SQL Server. -- John W. Colby www.ColbyConsulting.com