[dba-VB] New server

Shamil Salakhetdinov shamil at smsconsulting.spb.ru
Sat Sep 25 04:09:59 CDT 2010


Hi John --

<<<
Well guys, I got the OK today from the client 
to purchase the new server. It will use the 
AMD  Magney-Cours Opteron to build out a 
16 core machine with 64 gigs of memory.
>>>
Very impressive!
Envy is immense here, white one :)...

<<<
One of the guys I do business with called the other day.  
Said he was talking with a guy who claimed 
to have two BILLION names / addresses in a bunch 
of lists, and he said that was probably low, 
perhaps more like 6 to 8 billion.  I'm thinking 
I might be able to actually handle that business 
(not that I will necessarily get it).  Two years 
ago it would have been impossible.
>>>
Yes, John, but you'd better use streaming together with multi-threading -
that wouldn't require that high end hardware I suppose. I suppose only, I
can be wrong, I have never had opportunity to handle that much data.

<<<
I am pretty proud of my Frankenstein.  
It has been 5 years in the making but it is 
finally coming together.
>>>
This is what real passion - yours - and good money(?) can do...
You're a hero! 
Everybody here in AccessD are proud of having you as our colleague, and to
have opportunity to "recharge your passion" during "bound vs. unbound", "PK
vs. natural keys", "Access v. <= 2003 vs. Access v. >= 2007" etc. battles :)

<<<
My dream is to be sitting on the beach, with a 
laptop at my side, remote desktop into my servers, 
watching my servers make money for me.  We shall see.
>>>
Yes, that is a good dream, and it has high chances to come true there I
suppose.

I'm currently preparing my laptop and mobile modem to use them in the
Peterhof park (http://en.wikipedia.org/wiki/Petergof ) for "live
translation" of my family walking there this Saturday. Well, maybe not live
yet - this is just the first try. But we have 4G here, and so live
translation isn't a problem at all. I currently have 3G mobile modem for my
laptop, and my mobile ISP has strong limitation on monthly traffic - just
2GB - after that it's becoming very expensive, and 4G should be unlimited I
suppose...

Have nice weekend.

-- Shamil

-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Saturday, September 25, 2010 9:10 AM
To: VBA
Subject: [dba-VB] New server

Well guys, I got the OK today from the client to purchase the new server.
It will use the AMD 
Magney-Cours Opteron to build out a 16 core machine with 64 gigs of memory.

It has been a long time coming.

When I started this business in August 2004 I did not know a thing about SQL
Server.  I started with 
a single core 2800 AMD with 4 gigs of ram running Windows 2003 x32 and SQL
Server 2000.

I moved to 4 core and 16 gigs a couple of years ago, still Windows 2003
(x64) but SQL Server 2005 
and now 2008, and have since accumulated several more large lists and done a
ton of development in 
C# to get my application clicking.

For you .Net folks, I am now writing in C#, and have hired a programmer part
time to help me out as 
well.  We have designed a system to process my name / address lists.  The
idea is that every list 
needs to be sent through address validation every 30 days.

My first list had 65 million rows and came to me on a set of DVDs with Zip
files.  Fixed width 
format.  I started learning SQL Server, imported the list and did every
single thing manually.  It 
took me two weeks to get the first list imported, trailing spaces stripped
off the fields and set up 
for use.  It took hours to do much of anything!  Hell, it took DAYS to do
much of anything.

Today I have two lists with 100 million names, one with 64 million (split
off from the original 
list), the "database from hell" with 50 million records (the original list)
an email list with 23 
million, a children list with 22 million, a "dogs and cats" with 11 million
and a smoker with 5 
million and 3 million.  I just got an additional email list with 57 million
(not yet fully processed).

9 lists with about 420 million names and address when aggregated.  I need to
address validate all of 
those names every month.

I have developed a "table driven" system where I write supervisor records
that hold the name of the 
database and some other flag fields (written to by the process).  I have
broken the process down 
into 4 stages.

1) Stage 1 BCPs the data out of the table to a disk directory comma
delimited.  Two million record 
chunks / file.

2) Stage 2 moves the files from the output staging directory to a virtual
machine

The virtual machine runs a third party application that performs the address
validation on a chunk 
(CSV file).  That application is written in Foxpro for Windows, thus the
size limitation.  The third 
party application continuously scans an input directory looking for files,
processes the files and 
places the results and a CASS report (PDF) into an output directory.

The virtual machine is and will continue to be my bottle neck as it can only
process a 2 million 
record chunk approximately every 50 minutes, about 1/2 of which time is
spent out on a server in 
California.  However I can buy additional licenses to set up more VMs.

3) My stage three watches that output directory for files popping out of the
third party application 
and moves them off the virtual machine back to an input staging directory.

4) Stage four imports them back into a temp database in SQL Server, builds a
"valid address" flag 
field, and three SHA1 hash fields, and then moves only the move addresses
back into the live 
database. Before it brings the move addresses back in, it archives the old
address (where they lived 
before they moved).

Each stage runs in a thread and is asynchronous to the other stages.  IOW
stage one can be 
outputting files from TableX, while stage 2 is moving files from a
previously exported TableY out to 
the virtual machine.  Stage three is always watching for files from the
third party app (on the VM) 
and moving them to the input staging directory where they belong, and stage
four is watching for 
files to appear in its input staging directory and importing them back in to
SQL Server.

Stage one kind of drives the process since it writes a record for each 2
million record chunk file 
exported out of SQL Server.  That record (child to the supervisor record
that "orders" the address 
validation for a table) defines the staging out / in paths for the later
stages to work with, and 
provides fields for all the stages to log completion dates and status on a
per file basis, as well 
as RecordsAffected (where appropriate) and start / start times for each file
for each stage.

Each table has its own staging in / out directory on disk, created by stage
one and the paths are 
logged in the supervisor table.  After the files are processed Stage one
cleans up by zipping up the 
files in the output staging directory.  After stage four successfully
imports files back into SQL 
Server it cleans up by zipping up the files that have been imported.  Stage
four creates a brand new 
supervisor record set for 30 days out from the completion of stage four.
IOW, "do it again in 30 days".

All of this was originally written in stored procedures in SQL Server but
that was clumsy at best.
I pretty much had paper lists of stored procedures to run in specific order
to do this processing. 
Then stored procedures running the paper lists of stored procedures.  It was
getting out of hand!!! 
  I tried to automate it using VBA and Access (because that is what I knew)
but with a single thread 
for everything, that didn't work well at all.

Last fall I took a C# class at the local community college and started
writing the system described 
above, using C# to automate the system, executing the stored procedures
previously written over the 
years that did all this stuff.  That actually worked quite well, but now the
source was split 
between SQL Server stored procedures which are (relatively) hard to write /
debug / maintain and C# 
/ .Net (a dream to maintain).

When I hired Paul in May, we went into high gear and have just finished
completely replacing the old 
stored procedures with equivalent SQL directly out in C#.  Basically we took
the stored procedures, 
analyzed the dynamic SQL in each stored procedure, pulled that SQL inside of
C# and used a command 
object to send the SQL out to SQL Server.

Forms / combos to select databases to process, combos to select tables etc.
"Wizard" interfaces to 
do all of this stuff.  Far far away from the paper lists of stored
procedures of yesteryear.

Table driven, logged status and NLogger for errors.  Visual SVN for source
control.

Oh, and I have C# code to do backup / restores of these databases.  SQL
Server 2008 has compression 
built in and with SMO it is a breeze to write this kind of code in C#.

And the system also processes the orders that the client sends which uses
these tables.

Needless to say, I need more horsepower to keep up with the system as it
exists today. Four cores 
and 16 gigs of memory just can't handle the task(s) any more.

The new server will have 16 cores and 64 gigs of ram, and I am using 8 gig
ECC registered sticks so 
that if I need to I can drop in another 64 gigs later.  Next year when
Bulldozer comes out I will 
likely upgrade to 32 cores (if required) and 128 gigs of RAM.  With luck
that will be a simple 
processor swap.

I will eventually be placing the main databases (for use) on Flash drives to
speed up the reads. 
Most of the actual use (as opposed to the address updates) is read only,
i.e. I never write to the 
name / address tables (other than monthly address updates).  Flash drives
provide smoking fast reads 
and insane IOPS, and I already have a coprocessor raid controller to drive
that.  Coupled with 16 
cores and 64 gigs of memory this *should* give me a very responsive system.

So there ya have it, the list processing system dreamed up by an old Access
guy.  Available today 
(well, the new server is on the way) at a home office somewhere in the
foothills of the Smokey 
Mountains of NC.

I am pretty proud of my Frankenstein.  It has been 5 years in the making but
it is finally coming 
together.

One of the guys I do business with called the other day.  Said he was
talking with a guy who claimed 
to have two BILLION names / addresses in a bunch of lists, and he said that
was probably low, 
perhaps more like 6 to 8 billion.  I'm thinking I might be able to actually
handle that business 
(not that I will necessarily get it).  Two years ago it would have been
impossible.

My dream is to be sitting on the beach, with a laptop at my side, remote
desktop into my servers, 
watching my servers make money for me.  We shall see.

-- 
John W. Colby
www.ColbyConsulting.com
_______________________________________________




More information about the dba-VB mailing list