jwcolby
jwcolby at colbyconsulting.com
Sun Mar 27 22:01:44 CDT 2011
Shamil, > Yes, I see: you tend to build a multi-threaded application/utility system > with centralized control - and I tend to build an anarchical free-market > model based application/utility system :) LOL, I guess. To be quite honest I didn't understand exactly what you were saying. Remember that my C# skills are still quite rudimentary. Accuzip Processing system basically has developed over time. I didn't sit down to develop an app from scratch to do this. I started in late 2004 to just get one single list into SQL server and serve up counts to the client. I did the whole thing manually, building queries to import the data and clean it up using only SQL Server Management Server. When I started I had never touched SQL Server. Then I was asked to research an application for doing CASS http://en.wikipedia.org/wiki/Coding_Accuracy_Support_System and NCOA http://en.wikipedia.org/wiki/National_Change_Of_Address on the addresses in the list. There was only one list in the beginning which I originally labeled "the database from hell". So I researched and discovered a vendor (Accuzip) that would allow us a fixed price license against which I could process as many records as I wanted. All other vendors (that I found) charge on a sliding scale Cents / record kind of thing, and we had a LOT of records - 65 million in that first list. So I started building a system of stored procedures which performed the export out to Accuzip and then import the Accuzipped records back in. It took roughly 25 stored procedures to perform this process. I had a paper list of instructions on how to run the stored procedures, parameters to feed in etc, plus manually copying the files out and back in. I would manually type EXEC commands into a query window reading from my paper instructions. It was *sloooow* and *error prone*. Next came a stored procedure that ran the stored procedures. Etc. Etc. It just morphed over time. Plus I got more lists. The client was so impressed that I managed to do all this that I got actual orders (export the data and send to the clients) rather than just counts, and the business grew. I started using Access ( the tool I knew ) to automate some of it but it was a very poor fit. Since every operation ran in a single thread the entire application would lock up when SQL Server went out for 30 minutes to do some operation. And of course my hardware was underpowered. In the summer of 2009 I decided I really needed to automate this. The business had grown and the client was suddenly 50% of my income and it was a huge struggle to process these lists manually, even using stored procedures of lists of stored procedures. TSQL as a programming environment is crude to say the least. So I went to the local community college in Sept. 2009 and took the first of two available C# courses. By Christmas I was launching in to my automation attempt. Understand my C# skills were rudimentary at best but the language was robust and I could see the power. I really just started building a method of moving the paper list of stored procedures into C#, IOW C# would execute the stored procedures and monitor their completion. I was not focused on a top down "build a system" but rather let's get my manual process automated ASAP. I was overworked, and wasn't able to handle the load. It turns out that C# is very capable in this regard and the program (such as it was) started working and working well. At the end of the second semester (May 2010) I hired Paul, a young college student who was graduating from the community college to come to work for me part time. I started teaching him what I know about general programming and together we worked on this program. And it continued to morph. We started using NLog for logging, Visual SVN for source control etc. At some point I decided to just move the Stored Procedures out of SQL Server and into dynamic code generated directly in C#. Each step had to keep the program running because I actually used the program to fill orders and do counts. And remember our C# skills are still rudimentary. Then we "discovered" threads and soon thereafter events. Last fall I decided to remodel the program into a manager / supervisor / worker model where the manager tells the supervisor to do something, the supervisor tells other workers to do something and basically just check results. In this model, each level uses threads to scan it's tasks (watch flags) and keep its workers busy. So the dbExport class has a thread which build a database, pulls records, builds chunks, exports to disk. That is it's job. When the file is on disk it is done and control for that chunk is passed to another class (the Virtual Machine). The dbExport class uses flags to store each step of the process. The Virtual Machine class picks up the files and pulls them over to the VMs directory, monitors that Accuzip processes them, tracks errors (if any) from Accuzip, and places the resulting files back on a staging directory (as I call it) for another class to take over. The dbImport class is tasked with creating a temp database, pulling each chunk file back in from staging and getting the data back into the original list database. So the manager class (a single instance) loads supervisor classes, one per list database that needs processing. The Manager class also loads the VM class (only one of these at the moment but there could be more, with more licenses from Accuzip). Each supervisor loads a dbExport and dbImport class to process the records in its list. dbExport creates Process records and stores them in SQL Server and creates a process class for each process record. Each process record represents a chunk of data from a specific list database. the process class doesn't really do anything it is just a place for the flags to be loaded / manipulated, and it knows how to persist itself to SQL Server. dbImport basically gets Process class instances which are finished with stage 2/3 (out of Accuzip) and starts importing them back in to SQL Server. My concept was a factory kind of application. There can be none or dozens of databases needing to be processed. SQL Server and my disk system can only efficiently handle one at a time (at any stage) and Accuzip is the biggest bottleneck so the VM class sits in the middle of an export and an import class. Every class has its own job, every class has at least one thread for its use. dbExport has a thread to use in exporting data to a table and another thread to bcp the data out while the next chunk is created in the export database. dbImport has a thread to BCP data back into chunk tables, and another thread to take the chunk tables and merge them back into live data. The VM has a thread to monitor the input directory and another thread to monitor the output directory, moving files into and out of the VM. Every step has to be logged back to SQL Server as it completes, but the clsSupervisor keeps the progress flags in memory as long as a specific list database is being processed, and the supervisor holds a list of process classes which hold all of the flags for the steps for a single chunk. It is a busy system. I did not originally use threads, and everything ran on the main program thread. And I was back to my user interface locking up as long running processes happened. Now with threading, each stage (export / vm / import) has its own list control which is used to display the status of the list database / chunks as they progress through the system. Everything is logged to flags, but also to NLog tables. Each class has threads to keep its own processes fed and running, and each class does its own piece of the whole to process a list table through Accuzip. BTW, there is another entire C# program which processes orders for the client. It is also threaded, and one of its steps is to create a supervisor record to tell the Accuzip Monitor to Accuzip the addresses being processed for the order. Orders are the Accuzip processing which I mentioned have to cut in at the head of the line in Accuzip Monitoring. As you have seen, I could be processing a 65 million record list database when an order comes in. 65 million records is going to take at least 50 hours to process, and the order has to go out ASAP so the Accuzip Monitor (the Manager class actually) has to watch for orders and interrupt the data supervisor to allow the Order supervisor to cut in. Once Accuzip starts a process (chunk) I don't interrupt it, but when that process (chunk) finishes Accuzip, only Order chunks will process until the order is finished. As I said, it is a busy system. And remember that our C# skills are still rudimentary. I started coding in C# in Sept 2009 and started migrating this process to c# about December 2009. By May 2010 when Paul started work I was actually running an Accuzip process and doing orders using C#. Paul is a youngster right out of school. So much of the stuff you talk about goes right over my head Shamil. John W. Colby www.ColbyConsulting.com On 3/27/2011 8:08 PM, Shamil Salakhetdinov wrote: > Hi John -- > > Yes, I see: you tend to build a multi-threaded application/utility system > with centralized control - and I tend to build an anarchical free-market > model based application/utility system :) > > Thank you. > > -- > Shamil