[dba-VB] New server

jwcolby jwcolby at colbyconsulting.com
Sat Sep 25 06:17:04 CDT 2010


Shamil,

The high end hardware is used more for SQL Server. The clients in this business want to join lists 
together, joins on two lists with 60 million names where...  Can get pretty expensive in processing 
power.

The other day I was doing an address update on 64 million names.  I am not sure what happened but it 
appears that trying to do an update on a field for that many records caused the system to start 
hitting the page file, it "ran out of memory".  When that happens you can imagine the results.  The 
disk lights were flashing for 12 hour solid and never did finish.

 > I'm currently preparing my laptop and mobile modem to use them in the Peterhof park 
(http://en.wikipedia.org/wiki/Petergof ) for "live translation" of my family walking there this 
Saturday.

Very cool technology.  I just got a droid for my wife.  It claims to be a hot spot for up to 5 
devices, using its data link over the phone.  That would allow using the computer anywhere that we 
got cell.  The world is finally getting connected.  Just like the movies.

It looks like I'll get the new server running in the next couple of weeks.  I'll write more about 
whether and how all that server power effects my ability to get work done.  I can tell you that 
these large lists were bringing my server to it's knees.

John W. Colby
www.ColbyConsulting.com

On 9/25/2010 5:09 AM, Shamil Salakhetdinov wrote:
> Hi John --
>
> <<<
> Well guys, I got the OK today from the client
> to purchase the new server. It will use the
> AMD  Magney-Cours Opteron to build out a
> 16 core machine with 64 gigs of memory.
>>>>
> Very impressive!
> Envy is immense here, white one :)...
>
> <<<
> One of the guys I do business with called the other day.
> Said he was talking with a guy who claimed
> to have two BILLION names / addresses in a bunch
> of lists, and he said that was probably low,
> perhaps more like 6 to 8 billion.  I'm thinking
> I might be able to actually handle that business
> (not that I will necessarily get it).  Two years
> ago it would have been impossible.
>>>>
> Yes, John, but you'd better use streaming together with multi-threading -
> that wouldn't require that high end hardware I suppose. I suppose only, I
> can be wrong, I have never had opportunity to handle that much data.
>
> <<<
> I am pretty proud of my Frankenstein.
> It has been 5 years in the making but it is
> finally coming together.
>>>>
> This is what real passion - yours - and good money(?) can do...
> You're a hero!
> Everybody here in AccessD are proud of having you as our colleague, and to
> have opportunity to "recharge your passion" during "bound vs. unbound", "PK
> vs. natural keys", "Access v.<= 2003 vs. Access v.>= 2007" etc. battles :)
>
> <<<
> My dream is to be sitting on the beach, with a
> laptop at my side, remote desktop into my servers,
> watching my servers make money for me.  We shall see.
>>>>
> Yes, that is a good dream, and it has high chances to come true there I
> suppose.
>
> I'm currently preparing my laptop and mobile modem to use them in the
> Peterhof park (http://en.wikipedia.org/wiki/Petergof ) for "live
> translation" of my family walking there this Saturday. Well, maybe not live
> yet - this is just the first try. But we have 4G here, and so live
> translation isn't a problem at all. I currently have 3G mobile modem for my
> laptop, and my mobile ISP has strong limitation on monthly traffic - just
> 2GB - after that it's becoming very expensive, and 4G should be unlimited I
> suppose...
>
> Have nice weekend.
>
> -- Shamil
>
> -----Original Message-----
> From: dba-vb-bounces at databaseadvisors.com
> [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
> Sent: Saturday, September 25, 2010 9:10 AM
> To: VBA
> Subject: [dba-VB] New server
>
> Well guys, I got the OK today from the client to purchase the new server.
> It will use the AMD
> Magney-Cours Opteron to build out a 16 core machine with 64 gigs of memory.
>
> It has been a long time coming.
>
> When I started this business in August 2004 I did not know a thing about SQL
> Server.  I started with
> a single core 2800 AMD with 4 gigs of ram running Windows 2003 x32 and SQL
> Server 2000.
>
> I moved to 4 core and 16 gigs a couple of years ago, still Windows 2003
> (x64) but SQL Server 2005
> and now 2008, and have since accumulated several more large lists and done a
> ton of development in
> C# to get my application clicking.
>
> For you .Net folks, I am now writing in C#, and have hired a programmer part
> time to help me out as
> well.  We have designed a system to process my name / address lists.  The
> idea is that every list
> needs to be sent through address validation every 30 days.
>
> My first list had 65 million rows and came to me on a set of DVDs with Zip
> files.  Fixed width
> format.  I started learning SQL Server, imported the list and did every
> single thing manually.  It
> took me two weeks to get the first list imported, trailing spaces stripped
> off the fields and set up
> for use.  It took hours to do much of anything!  Hell, it took DAYS to do
> much of anything.
>
> Today I have two lists with 100 million names, one with 64 million (split
> off from the original
> list), the "database from hell" with 50 million records (the original list)
> an email list with 23
> million, a children list with 22 million, a "dogs and cats" with 11 million
> and a smoker with 5
> million and 3 million.  I just got an additional email list with 57 million
> (not yet fully processed).
>
> 9 lists with about 420 million names and address when aggregated.  I need to
> address validate all of
> those names every month.
>
> I have developed a "table driven" system where I write supervisor records
> that hold the name of the
> database and some other flag fields (written to by the process).  I have
> broken the process down
> into 4 stages.
>
> 1) Stage 1 BCPs the data out of the table to a disk directory comma
> delimited.  Two million record
> chunks / file.
>
> 2) Stage 2 moves the files from the output staging directory to a virtual
> machine
>
> The virtual machine runs a third party application that performs the address
> validation on a chunk
> (CSV file).  That application is written in Foxpro for Windows, thus the
> size limitation.  The third
> party application continuously scans an input directory looking for files,
> processes the files and
> places the results and a CASS report (PDF) into an output directory.
>
> The virtual machine is and will continue to be my bottle neck as it can only
> process a 2 million
> record chunk approximately every 50 minutes, about 1/2 of which time is
> spent out on a server in
> California.  However I can buy additional licenses to set up more VMs.
>
> 3) My stage three watches that output directory for files popping out of the
> third party application
> and moves them off the virtual machine back to an input staging directory.
>
> 4) Stage four imports them back into a temp database in SQL Server, builds a
> "valid address" flag
> field, and three SHA1 hash fields, and then moves only the move addresses
> back into the live
> database. Before it brings the move addresses back in, it archives the old
> address (where they lived
> before they moved).
>
> Each stage runs in a thread and is asynchronous to the other stages.  IOW
> stage one can be
> outputting files from TableX, while stage 2 is moving files from a
> previously exported TableY out to
> the virtual machine.  Stage three is always watching for files from the
> third party app (on the VM)
> and moving them to the input staging directory where they belong, and stage
> four is watching for
> files to appear in its input staging directory and importing them back in to
> SQL Server.
>
> Stage one kind of drives the process since it writes a record for each 2
> million record chunk file
> exported out of SQL Server.  That record (child to the supervisor record
> that "orders" the address
> validation for a table) defines the staging out / in paths for the later
> stages to work with, and
> provides fields for all the stages to log completion dates and status on a
> per file basis, as well
> as RecordsAffected (where appropriate) and start / start times for each file
> for each stage.
>
> Each table has its own staging in / out directory on disk, created by stage
> one and the paths are
> logged in the supervisor table.  After the files are processed Stage one
> cleans up by zipping up the
> files in the output staging directory.  After stage four successfully
> imports files back into SQL
> Server it cleans up by zipping up the files that have been imported.  Stage
> four creates a brand new
> supervisor record set for 30 days out from the completion of stage four.
> IOW, "do it again in 30 days".
>
> All of this was originally written in stored procedures in SQL Server but
> that was clumsy at best.
> I pretty much had paper lists of stored procedures to run in specific order
> to do this processing.
> Then stored procedures running the paper lists of stored procedures.  It was
> getting out of hand!!!
>    I tried to automate it using VBA and Access (because that is what I knew)
> but with a single thread
> for everything, that didn't work well at all.
>
> Last fall I took a C# class at the local community college and started
> writing the system described
> above, using C# to automate the system, executing the stored procedures
> previously written over the
> years that did all this stuff.  That actually worked quite well, but now the
> source was split
> between SQL Server stored procedures which are (relatively) hard to write /
> debug / maintain and C#
> / .Net (a dream to maintain).
>
> When I hired Paul in May, we went into high gear and have just finished
> completely replacing the old
> stored procedures with equivalent SQL directly out in C#.  Basically we took
> the stored procedures,
> analyzed the dynamic SQL in each stored procedure, pulled that SQL inside of
> C# and used a command
> object to send the SQL out to SQL Server.
>
> Forms / combos to select databases to process, combos to select tables etc.
> "Wizard" interfaces to
> do all of this stuff.  Far far away from the paper lists of stored
> procedures of yesteryear.
>
> Table driven, logged status and NLogger for errors.  Visual SVN for source
> control.
>
> Oh, and I have C# code to do backup / restores of these databases.  SQL
> Server 2008 has compression
> built in and with SMO it is a breeze to write this kind of code in C#.
>
> And the system also processes the orders that the client sends which uses
> these tables.
>
> Needless to say, I need more horsepower to keep up with the system as it
> exists today. Four cores
> and 16 gigs of memory just can't handle the task(s) any more.
>
> The new server will have 16 cores and 64 gigs of ram, and I am using 8 gig
> ECC registered sticks so
> that if I need to I can drop in another 64 gigs later.  Next year when
> Bulldozer comes out I will
> likely upgrade to 32 cores (if required) and 128 gigs of RAM.  With luck
> that will be a simple
> processor swap.
>
> I will eventually be placing the main databases (for use) on Flash drives to
> speed up the reads.
> Most of the actual use (as opposed to the address updates) is read only,
> i.e. I never write to the
> name / address tables (other than monthly address updates).  Flash drives
> provide smoking fast reads
> and insane IOPS, and I already have a coprocessor raid controller to drive
> that.  Coupled with 16
> cores and 64 gigs of memory this *should* give me a very responsive system.
>
> So there ya have it, the list processing system dreamed up by an old Access
> guy.  Available today
> (well, the new server is on the way) at a home office somewhere in the
> foothills of the Smokey
> Mountains of NC.
>
> I am pretty proud of my Frankenstein.  It has been 5 years in the making but
> it is finally coming
> together.
>
> One of the guys I do business with called the other day.  Said he was
> talking with a guy who claimed
> to have two BILLION names / addresses in a bunch of lists, and he said that
> was probably low,
> perhaps more like 6 to 8 billion.  I'm thinking I might be able to actually
> handle that business
> (not that I will necessarily get it).  Two years ago it would have been
> impossible.
>
> My dream is to be sitting on the beach, with a laptop at my side, remote
> desktop into my servers,
> watching my servers make money for me.  We shall see.
>



More information about the dba-VB mailing list