Gustav Brock
gustav at cactus.dk
Sat Mar 26 07:41:44 CDT 2011
Hi John My first thought was Windows Workflow Foundation(WF) but, as Shamil mentions, it may be overkill. On the other hand it is exactly for controlling scenarios where "if task 1 is done, start task 2 and task 3, wait for task 2 and task 3 to finish, check something, then start task 4, etc.". However, that reminded me about a series of articles about the Task Parallel Library of .Net: http://www.codeproject.com/KB/cs/TPL1.aspx which I found very interesting, though I haven't had any use for it yet. Among other topics it discusses carefully canceling and error handling which I guess is quite important for your purpose. /gustav >>> jwcolby at colbyconsulting.com 26-03-2011 03:57 >>> Shamil, I have processes that log results to flags. For example, make a database (log that it was made), build a table (log that it was built), pull umpteen million records in sorted order (log that it was filled), build a chunk table (log that it was filled), bcp out (log that it was exported), build another chunk table (log that it was filled), BCP out (log that it was exported). The objective is to be able to sustain interruptions and pick up where we left off. These processes can take minutes (fill chunk table, bcp out) or a half hour (pull umpteen million records in sorted order). So each thing I do represents a step in the process and each step is logged in a field in a record in SQL server using a datetime. There are so many of these flags that I am trying to standardize the process by building a class that can be instantiated, filled with data and log itself to SQL Server by one thread and be checked by another thread. These flag class instances will be checked by multiple threads, each thread trying to decide whether it should be doing the next step because another thread has finished it's part. IOW if a file has been written to disk, then the next thread will write it to a VM for processing. If it moved to the VM the next thread will watch the VM's output directory for a file to pop out and move it back to a directory on the server. If the file (a couple of files actually) successfully copied back to the server staging then another thread will import it back into a chunk table in an input database. If the file successfully imported then another thread will... In general one thread will "own" the flag and use it to log its status and one other thread will be checking the status of the flag to determine that it can go to work on that work chunk. You get the picture. I am trying to build an entirely asynchronous highly threaded process which exports a huge table into multiple files, processes every file through a third party app and gets the results back into SQL Server. All while logging each and every step so that no piece can possibly be dropped at any stage, even if the server goes down (or the VM goes down). Eventually this process will run on my server 24/7. It has been working for some time but I am getting threading issues, and I need to work on the high level control so that all of the processes can cleanly start up and shut down and every stage can pick back up when the program restarts should a shutdown occur. A single database can be up to a hundred million records (the biggest so far), and the external program only handles roughly 2 million records. Each "chunk" takes roughly 45 minutes to an hour depending on many different things so that example will take 50 chunks and could take 40 to 50 hours to complete. It takes about 20 processing steps to handle each file from end to end. It needs to just work, and I need to be able to view status in a meaningful way. And I need to process that and a dozen other files every single month, automatically, with no manual intervention required. John W. Colby www.ColbyConsulting.com