JWColby
jwcolby at colbyconsulting.com
Tue Oct 31 10:20:43 CST 2006
I thought you guys might find this interesting. I have a database that I imported a couple of years ago. On a single processor 3 ghz AMD64 running 2 mbytes of memory, using (4) individual IDE 250gb hard drives (no raid) the system would import ~ 1000 rows per second into SQL Server. Each text file was ~10 gbytes, consisted of ~700 fields and 3 million records per file. Each field was originally padded right with spaces (comma delimited, but fixed width). This time around, I built an Access (really just VBA) preprocessor to open each file, read it line by line, strip all of the padding off the left and right sides (there was some left padding as well) and write it back out to another file. This dropped the average text file size to ~ 6.5 gbytes, which leaves us with average padding of well over 35%. It also left the resulting data in the unpadded after importing into SQL Server which makes sorts / searches and indexes possible. Anyway, after stripping all of this padding and building new files, I am now importing these into my new server which is a AMD64 X2 dual processor 3.8 ghz with 2 gbytes of ram. The disk subsystem is now a pair of volumes hosted on a raid 6 array, 1 tbyte for the main data store and ~400 gb for the temp databases. The new system imports the new (stripped) data files at about 3000 records per second. I have to run 3 imports at a time to keep both cores above 90% usage. Running 3 imports at a time, the imports happen roughly at 2k records / second FOR EACH IMPORT. Oddly, if I run more than 4 imports at a time, the processor usage drops back to ~70% for some reason and in fact each import slows to ~500 imports / second. This may have to do with the limits of disk streaming off of the machine that holds the source text files. The source files come from a second machine, all the files on the same disk / directory, over a 1ghz network (switch). I am happy to say though that the new dual processor server appears to be able to perform this specific task ~3 to 6 times as fast which is a huge and much needed performance boost. The other advantage to this configuration is that I am no longer playing games splitting the database up into smaller files residing on individual hard drives, and of course, the whole thing is using raid 6 which provides much needed security. John W. Colby Colby Consulting www.ColbyConsulting.com