[AccessD] VLDBs, the saga - Ramblings only

John W. Colby jwcolby at colbyconsulting.com
Sat Aug 28 06:56:11 CDT 2004


I have been working on getting a rather large database into SQL Server.
This thing is ~65 million names plus demographics info and will be used for
bulk mailing analysis.  I have been struggling for weeks to make this
happen.  Having no experience with a database this big, I had no idea how
big the final db would be.  I tried to get it in to a 160g drive but it
rapidly became obvious that wouldn't hold it.  I then purchased two 200g
drives and used Raid0 to make a big 400 g drive.  I thought I turned on
compression but after getting well into the extraction process I discovered
this wasn't the case.  I then started trying to figure out how to get the
drive compressed.

Long story short, a NTFS drive can be compressed, even a raid array such as
this, however...  There is a control that allows you to select the sector
size.  I had selected the "compress" check box but then selected a sector
size of 64K.  As I started investigating why the drive wasn't compressed it
turns out that only sector sizes of 512 to 4K bytes allow compression.
Anything larger causes the "compress drive" check box to gray out and the
drive ends up uncompressed.

By this time I had already spent several days extracting zipped files of
data and BCPing them into SQL Server so I had a MDF file of over 300gb and
no place to put it!

Sigh.

Out of desperation I decided to try zipping the database file.  I started it
PK zipping last night onto an empty 160g partition.  This morning I had a
10gb zipped file that supposedly contains the MDF file!

I then deleted the partition on the 400gb Raid array and started playing
with the compression / block size which is when I discovered the >4K sector
size gotcha.  I set the sector size to 4K and quick formatted, then started
unzipping the MDF file to the (compressed) 400gb raid array.  

We shall see.  The unzip is not finished, in fact has several hours to go
yet.  If this works I will celebrate.

This whole project has been a challenge.  It looks like the database will be
around 600g for JUST the data, never mind any indexes.  I simply don't have
the money to build a raid 5 array to up the uncompressed drive size.  Even
if I did, IDE drives larger than 250gb are expensive and AFAICT only
available in 5200 RPM.  Plus the overhead of Raid5 is "One Drive" which
means I'd need (4) 300g drives to build a 900g usable space raid5 array.
Raid1 (which I am currently using) tops out at 600g using (2) 300g drives
(uncompressed).  So far my (2) drive Raid1 array using 200g drives has cost
me $240 plus a controller I already had.  A Raid5 solution using 300g drives
would cost about $1200 just for the new controller and 4 drives!

With any luck, given the massive compression PKZip managed to attain, I will
be able to shoehorn the 600g.  

Update 8-(

As I write this I just got a "delayed write failed" message from Windows
saying it lost data trying to write to the disk.  I have tried to turn off
write caching but can't seem to find the magic button to cause Windows to
quit using "Delayed write".

BIG sigh!

If I can't get the db back out of the zip file I will be facing a weekend of
"starting from scratch" on getting the data out of the raw text files and
back in to SQL Server!

And I thought this would be fairly easy.

John W. Colby
www.ColbyConsulting.com 





More information about the AccessD mailing list