[dba-SQLServer] Dual cores running 100%

Sat Nov 4 22:42:31 CST 2006

LOL, well I have nirvana.  Well... sorta.  I am in the process of CASS
validating my 65 million record database using Accuzip software, a rather
kludgy process which uses FoxPro to validate ~3 million record chunks.  I
have two instances running on two different files.  As you guys probably
know by now, my system is now running Windows 2003 SBS, a dual core AMD X2
at 2.8 GHz and 2 gbytes and a raid 6 array with two largish volumes.  

Running just one instance of Accuzip the system processes about 6 million
records / hour.  Starting the second instance, leaves the first still
processing about 6 million records per hour but the second is processing
about 2.4 million records per hour.  I find the inequality odd.  My
experience in the past (using XP Pro) is that it slowed down the first
instance and both instances would run at roughly the same speed.  Whatever.

So basically I get an additional 40+ % processing using the second instance
of Accuzip.  Boy would it be nice to have a 16x processor array right now,
but this is usable.  I have to process ~22 files, each file would take ~30
minutes if processed by itself.  Once the processing is finished, I delete
the undeliverables and pack the FoxPro database, then start the next file
importing.  

Once all 22 files are processed, I then have to re-import them into SQL
Server and process them further there.  At that point after the import I
will have deliverable addresses, I am deleting all undeliverables.  Since
the exported / reimported records have a PK that matches my original (big)
table, I will then have the ability to create a master list of completely
valid deliverable ADDRESSES (not necessarily people at those addresses).  It
appears so far the non-deliverables are running around 14% which means I
will end up with 86% of 64 million addresses in the deliverable master
address table.

This is only the address validation side of the picture.  The fun begins
when I normalize the big table into a bunch of smaller tables as discussed
in an earlier email.  I have actually done that for one specific job (a
boating database) in order to do matching on a small (50K) database of names
of yacht owners that they got from a client.  The biggest problem is that in
order to match names (or even addresses) you have to compare apples to
apples and neither database was in CASSed form, and neither database
provided "match codes" do do the comparison on.  Much work quickly to make
this happen.

John W. Colby
Colby Consulting
www.ColbyConsulting.com