[dba-VB] Run state class

jwcolby jwcolby at colbyconsulting.com
Sat Oct 29 11:56:58 CDT 2011


 > Question: is the application taking advantage of the new "parallelism" presented in VS 2010 or is 
it just using multi-threading?


LOL.  "Just" multi-threading.  Believe me I am thrilled with that.  Threading and communications 
between processes running in threads can be a major PITA.

The RunState class itself is a smallish class, not really a lot there.  It never occurred to me to 
try and market the list cleaning system, and I really think it is too complex to do so.  If I could 
get a high enough bandwidth I could market list cleaning as a service, using the system.

What the program has really done is turned a completely manual process into an almost entirely 
automated process.

I took on this client in late 2004.  At that time I was entirely into Access and had never even 
touched SQl Server.  The data was originally a single list of 65 million names.  I built a server 
(single core, 4 gigs ram, Windows 2003 X32, SQL Server 2000 X32) and researched the vendor that I 
still use to do the address validation.  I installed that software (Accuzip) and started learning 
how to do address validation.  I had to process those 65 million addresses, and the first time I did 
it it took me about two weeks, 80 hours.  Manual export of 60 million names into 2 million record 
chunks into CSV files, placing those files into an input directory of Accuzip and taking the 
finished files from the output directory and importing them back into SQL Server.

Understand that I had no tools other than SQL Server itself and whatever it provides.  I also had 
vastly underpowered servers.

As I learned SQL Server I started building stored procedures out in SQL Server to automate the bcp 
out and bcp in.  Then I built up stored procedures to loop and export and import the 2 million 
record chunks.  It just slowly grew into an "automated process" but it was entirely in SQL Server 
which is not particularly "user friendly", either as a programming environment nor as a end user 
interface.  But it worked.

I eventually started trying to use Access to execute the existing stored procedures but that never 
really worked well.  Access is single threaded and these processes could take a loooooong time with 
the hardware of the day.  So Access would "lock up" as it waited for the stored procedures to 
finish.  Remember the vastly underpowered servers!

I really had an automation breakthrough when I went to the local community college in the fall 
semester 2009 and took a C# class, fall and spring semester.  By the end of December I had started 
using C# to execute the existing stored procedures, there were somewhere around 30 of them.  I 
started with getting the stored procedures in SQL Server to run in exactly the same order as when I 
manually executed them.  Then I added reporting using nlog.  Then I developed a status class to 
display status into a list control on a form.  As pieces came together it became easier to refine 
the system.

I also hired Paul as a part timer helping me write the code.

 From that point it was really something similar to the spiral development model.  Take something 
that works and refine it, get it working, then refine it again etc.  We didn't get to the manager / 
supervisor / threaded implementation until about 6 months ago, and we just went through another 
major cycle last week.

For a long time the system worked but we would have problems that I would manually intervene to keep 
it running.  I just had to have patience and remember where I came from.  I have expanded from one 
list of 65 million addresses to about 8 lists with about 300 million addresses.  Without the 
automation that the program provides I wouldn't be able to process this many addresses.  As it is, 
even with the "manual interventions" I spend a few hours of actual labor to process these 8 lists.

With this latest rewrite it is finally getting to a truly stable state where it just runs.  I had 
communication issues between stages where if anything went wrong the process would stop processing a 
stage.  The latest rewrite really simplified that inter-stage communication and allows each stage to 
reliably see when the previous stage finishes.

In order to be in this business I am legally obligated to process all the addresses every 30 days. 
This program is the only thing that makes that possible at this scale.

John W. Colby
Colby Consulting

Reality is what refuses to go away
when you do not believe in it

On 10/29/2011 12:07 PM, Jim Lawrence wrote:
> It sounds very impressive as a piece of software that took months (years) to
> design and debug. I would assume you will be marketing this product soon.
> ;-)
>
> Question: is the application taking advantage of the new "parallelism"
> presented in VS 2010 or is it just using multi-threading?
>
> Jim
>



More information about the dba-VB mailing list