jwcolby
jwcolby at colbyconsulting.com
Sat Jan 5 20:13:14 CST 2008
OK, I have the x64 system up and running. I am doing tests on the same four databases. I have: HSID which contains data used in selection criteria, with a PK. 50 million records. AZHSID contains address validated records of name / address fields originally in HSID. With a PK matching the PKs in HSID. 50 million records ZipCodeWorld contains about 80K records of valid zip codes. It has information that allows me to exclude prisons, military FPOs etc. PSM11132 contains all of the queries required for me to fill an actual order of addresses (112000) going out to a client. The order asks for: Source File FIELD Select Codes Description Geography HSID/INF 074 2, 3, 6 OR Female 35 44 HSID/INF 073 2, 3, 6 Female 45-54 AND HSID/INF 091 7-9, A-T INC $50,000+ AND HSID/INF 282 Y Tennis HSID/INF 283 Y OR Golf HSID/INF 284 Y OR Snow Skiing HSID/INF 288 Y OR Bicycling HSID/INF 292 Y OR Running HSID/INF 294 Y OR Horseback Riding HSID/INF 298 Y OR Power Boating HSID/INF 299 Y Sailing AND Zip Code World 015 16980 CBSA Chicago Zip Code World 015 19820 OR CBSA Detroit Zip Code World 015 12580 OR CBSA Baltimore Zip Code World 015 31100 OR CBSA Los Angeles Zip Code World 015 35620 OR CBSA New York Zip Code World 015 37980 OR CBSA Philadelphia Zip Code World 015 47900 CBSA Washington DC As you can see basically they want women 35 to 54, income > 50K who participate in various hobbies, and live in selected big metropolitan areas. In order to prep the order I had to build cover indexes (field and PK) on the hobby fields. I already had cover indexes on the income and PK, and a single cover index on ALL of the age fields plus the PK. I then built FOUR different views. One for Age/Sex, one for income, and one for hobbies. Those three are ANDS so by pulling the PKs for each of those views I could do an inner join to get the AND. From these queries I actually only pulled the PK field itself to minimize data pulled out of the table. I did not need the select data itself, just the PKs for the joins. I then built a fourth view of Zip Code World (ZCW) using the CBSA code that selected ZIPS in specific metropolitan areas. From this view I pulled ONLY the zip field. I built a FIFTH master view that pulled in the four views above, PLUS the AZHSID table. AZHSID contains the PK to match the HSID select views, plus address validated name / address information. That information will be actually extracted and written into a flat file on disk, zipped and sent to the client. So I have age/sex, income, hobbies and AZHSID inner joined. The data pulled is just the PKID and name / address which includes Zip. I then inner join the zip from the ZCW view to limit the addresses to those in the right zip codes. Whew. OK, so the thing I am going to time is the COUNT of the PKs in this big query (how many people match all the criteria and live in the right zips). I am going to time the count on my original system that I actually used to run this order, and then on the new system. The original system (Stonehenge) is a AMD "3800" X2 with 4 gigs ram, Windows 2003 x32, SQL Server 2005 x32, with a dedicated raid controller card with an 8 drive raid 6 array. The new system (W2k3x64) is (currently) an AMD "3800" x2 with 4 gigs ram, Windows 2003 x64, SQL Server 2005 x64, with a pair of 650 gig IDE 100 hard disks. No raid. The databases reside on one of the 650 gig drives and the OS resides on the other. Stonehenge runs the count query in 5:32 (5 minutes, 32 seconds) and counts 630,025 valid addresses with the criteria above, in the zips specified. W2k3x64 runs the query in 9:26 (9 minutes and 32 seconds) and counts 630,025 records (just a check that we are counting the same thing). So the first thing that we see is that the new system is slower. The difference is likely caused by the fact that the Raid array in Stonehenge STREAMS data at 400 mbytes / sec whereas the single disk in W2k3x64 streams MUCH slower, probably around 50 mbytes / second. We do not KNOW that is the reason for the difference at this point, but it is a pretty good bet. I "sacrificed" the video controller from Azul to build the new system (yea, poor planning, forgot the damned video card) so I will probably move the raid array into the new system, or more likely move the new system (and boot disk) into the cabinet that Azul currently occupies. Moving a raid array with 8 disks is not something I want to do if I can avoid it. Next, I am going to pull the other 4 gigs (two 2 gig sticks) out of Azul and move it into W2k3x64. I have a video card on order, will be here next week but for now Azul is dead anyway so I might as well scrounge the memory as well. I did not take CPU and memory usage readings with just the 4 gigs, but with 8 in there both processors are pegging (100%) and the memory usage is maximum (only about 200 megs available) and it is using about 8 gigs of page file as well. W2k3x64 runs the same query in 9:11 after adding two additional sticks of memory and boosting the total to 8 gigs. Oddly, SQL Server has a little control for telling it how much it can have max, and it is maxed at ~2 gig bytes. I was under the impression that it would now allow SQL Server to natively use more ram. Sigh. there is an unchecked box for "Use AWE to allocate memory" which I just checked and am now rerunning the query a third time. With AWE memory enabled, the query ran in 8:27. A slight boost but certainly not stellar. So why is SQL Server not using more memory natively? What I really need to do is create a dual boot with Windows 2003 x32 and SQL Server 2003 x32, and then run the count query again for that. I suspect though that the Raid card and array would do more than anything to bump the speed back up. I am going to swap the quad core processor from Azul and see what that does for me. I think I will swap the view card and memory back in to Azul and try running the same database on Azul. It is virtually identical to Stonehenge. It has a quad core AMD, 4 gigs of memory and an 8 disk raid array on the same controller card that Stonehenge has. I could also TRY to do a dual boot to W2K3x64 but it currently boots off the raid which makes installing the new OS difficult (adds the issue of x64 raid drivers at install time). F6, floppy and all that. I hate that stuff. Well, more later. John W. Colby Colby Consulting www.ColbyConsulting.com