[dba-VB] c# lock()

Sat Mar 26 07:41:44 CDT 2011

Hi John

My first thought was Windows Workflow Foundation(WF) but, as Shamil mentions, it may be overkill. On the other hand it is exactly for controlling scenarios where "if task 1 is done, start task 2 and task 3, wait for task 2 and task 3 to finish, check something, then start task 4, etc.".

However, that reminded me about a series of articles about the Task Parallel Library of .Net:

http://www.codeproject.com/KB/cs/TPL1.aspx 

which I found very interesting, though I haven't had any use for it yet. Among other topics it discusses carefully canceling and error handling which I guess is quite important for your purpose.

/gustav

>>> jwcolby at colbyconsulting.com 26-03-2011 03:57 >>>
Shamil,

I have processes that log results to flags.  For example, make a database (log that it was made), 
build a table (log that it was built), pull umpteen million records in sorted order (log that it was 
filled), build a chunk table (log that it was filled), bcp out (log that it was exported), build 
another chunk table (log that it was filled), BCP out (log that it was exported).  The objective is 
to be able to sustain interruptions and pick up where we left off.  These processes can take minutes 
(fill chunk table, bcp out) or a half hour (pull umpteen million records in sorted order).

So each thing I do represents a step in the process and each step is logged in a field in a record 
in SQL server using a datetime.  There are so many of these flags that I am trying to standardize 
the process by building a class that can be instantiated, filled with data and log itself to SQL 
Server by one thread and be checked by another thread.

These flag class instances will be checked by multiple threads, each thread trying to decide whether 
it should be doing the next step because another thread has finished it's part.  IOW if a file has 
been written to disk, then the next thread will write it to a VM for processing.  If it moved to the 
VM the next thread will watch the VM's output directory for a file to pop out and move it back to a 
directory on the server.  If the file (a couple of files actually) successfully copied back to the 
server staging then another thread will import it back into a chunk table in an input database.  If 
the file successfully imported then another thread will...  In general one thread will "own" the 
flag and use it to log its status and one other thread will be checking the status of the flag to 
determine that it can go to work on that work chunk.

You get the picture.

I am trying to build an entirely asynchronous highly threaded process which exports a huge table 
into multiple files, processes every file through a third party app and gets the results back into 
SQL Server.  All while logging each and every step so that no piece can possibly be dropped at any 
stage, even if the server goes down (or the VM goes down).  Eventually this process will run on my 
server 24/7.

It has been working for some time but I am getting threading issues, and I need to work on the high 
level control so that all of the processes can cleanly start up and shut down and every stage can 
pick back up when the program restarts should a shutdown occur.

A single database can be up to a hundred million records (the biggest so far), and the external 
program only handles roughly 2 million records.  Each "chunk" takes roughly 45 minutes to an hour 
depending on many different things so that example will take 50 chunks and could take 40 to 50 hours 
to complete.  It takes about 20 processing steps to handle each file from end to end.  It needs to 
just work, and I need to be able to view status in a meaningful way.  And I need to process that and 
a dozen other files every single month, automatically, with no manual intervention required.

John W. Colby
www.ColbyConsulting.com