[dba-SQLServer] FYI - nVLDB performance

Tue Sep 14 11:45:32 CDT 2004

Just to let you know some numbers on the database as it currently stands.

First, I have two almost identical machines which I built to hold and
manipulate the database.  Both machines use an MSI K8N Neo motherboard with
built in 4 port SATA and dual channel IDE, gbit LAN.  The processor is the
754 pin Athlon64 at 3ghz.  The server currently has Windows 2K and 3g RAM
installed.  Apparently Win2K can use up to 4g ram whereas XP is limited to
2g.  Unfortunately I cannot persuade SQL Server to use more than 2g RAM so I
am not sure that more memory than that is really useful in this context.

The server then has (4) 250g Maxtor SATA drives and (1) 250g Maxtor IDE
drive holding the data files and the log file respectively.  The second
machine is currently running XP Pro.  Since the two new machines have gbit
nics built into the motherboard I bought an 8 port gbit switch so they could
talk at full speed.  In general I have found that I can run operations from
the second machine to the server over gbit LAN at close to full speed, i.e.
the LAN is not a severe bottleneck anymore (it definitely was at 100mbit).

The db is 164 million records, 660 fields, ~3K bytes / record.  I have split
the db into 5 files on (5) 250g hard disks.  I turned on Bulk logging for
the import and when the raw data was finished importing, the data files were
about 40g apiece after compacting and the log file about 100 meg.  Turning
on bulk logging actually enabled the process to complete in a timely manner.

After import I added a PK field Long autoincrement.  Doing that locked up
the machine for over 24 hours but finished with the new field created as
expected.  When finished the log file was sitting at the 200g maximum size,
the 4 data files were sitting at about 80gb each, but shrunk back to about
40 - 60g.

I then added an index on the state field and ended up with two data files at
40g and two at 80g.  Adding the PK field and its index as well as the index
on the state field apparently added about 80g of data to the 160g used for
the data itself.  I have to apologize but I didn't keep written logs of file
sizes after each step.  Each of these operations also took overnight.

After each operation I used the shrink to compact the files back down.
Strangely, the shrink operation took a rather long time on two of the data
files (as much as an hour) but only about a minute on the other two and the
log file.

I have turned to WinZip for backup.  Zipping any of the database or log
files gives me about a 13 to 1 compression ratio thus I can zip the files
and store a 43 gig file in a 3.3 gig zip.  Of course it does take awhile.  I
have the server zip a file and the other NEO based machine zip another.  The
second NEO finishes about 10% behind the server (time wise) which is pretty
darned good IMHO.  Someday I will time the zip operation.

I have run a count of records grouped by state.  That operation took a mere
1 minute and clearly demonstrates the power of the index!  I also built a
view to build a "every hundredth record" subset of the data using a "WHERE
(PK % 100) = 0" clause.  This process takes about 25 minutes to create the
dataset.  My intent is to use that view to create a table containing that
sub data set.  Then use that smaller table to run various analysis that the
client is asking for.  

I still have to run what I fear will be a very time consuming process to
strip trailing spaces from various fields.  A lot of work remains but at
least I have it all in SQL Server and can produce results now.

John W. Colby
www.ColbyConsulting.com