[dba-VB] SPAM-LOW: Re: How do you run SQL Server processes

Thu Nov 5 10:08:36 CST 2009

Robert,

Thanks for the reply.

The issue here is that I have to run the same code against a large set of databases / tables.  Fir 
this discussion, the "process" is exporting millions of records to CSV and later importing the 
resulting CSVs back in again.

For example I have about 6 different databases, each with it's own set of tables.  Additionally I 
have to do this exact same process on "Orders" where I am selecting anywhere from a few thousand to 
a few million addresses to fill an order.  The client wants those orders processed as well.

Understand that I am not claiming that the following is the most efficient, just "what I know how to 
do".

So my solution so far has been to build a set of stored procedures where I pass in the database and 
the source table (for the export process) and dynamically build up SQL statements in the SP.  The 
generic process is to:

1) SP1: Build a big table to hold the entire set of records, sorted on Zip5/Zip4, with an autonumber 
for selection of 2 million record chunks.  Move the records from TblSrc (or ViewSrc) to TblBig. 
Build an index on autonumber and the fields to export to make the subsequent processing faster.
	Params DB, TblNameSrc.
2) SP2: Build a chunk table.  Move a chunk from TblBig to TblChunk
	Params: DBName, StartPKID, EndPKID
3) SP3: BCP TblChunk to CSV file
	Params: DBName, Destination Dir

This is not an exact representation of what is going on but gets the point across, which is that I 
have to do this on a "random" database, probably a dozen a month.

Initially I did this in stored procedures stored inside of each database but the maintenance of the 
stored procedures became a nightmare.  As I would figure out a better way, a different way, fix a 
bug... I would be constantly trying to modify my code in each database.  NOT FUN!!!

Eventually I moved to the parameterized stored procedure with dynamically constructed SQL executed 
inside of try/catch blocks etc.  I store all of these SPs out in a "management" database and execute 
them from there, passing in the database name, table name and any other parameters required to make 
each SP work.

As I have said, I don't know enough to determine whether this is "the best way", I have enough 
problems just getting this much to work.

I have in fact started using logging from inside of each SP.  I designed a standard log table with 
fields for the SP name being processed, record counts, error codes if any, memo field so I can write 
little explanation of each step and so forth.  I JUST started doing that.  I created a SP (again 
parameterized, stored in the management db) that performs the write to the log, then I call that SP 
whenever I want to log anything.

The thing everyone following this thread needs to understand is that I have no resource other than 
this list and Google to learn how to do this stuff.  I work alone, I do what I have to, as quickly 
as I reasonably can, because my job (for this client) is to process data, NOT write code.  I am a 
consultant providing a service, NOT a DBA.

That said, I have no one but myself to assist me in automating this stuff so that I do not do every 
step manually, and believe me, when I started I did EVERY STEP manually.  Everything that is now a 
SP I did manually, sometimes dozens of times over the period I have worked for this client.  It was 
not only UGLY, it was DAMNED UGLY.

I am a programmer by trade, a DBA/Programmer by necessity, a consultant/ business analyst/ DBA/ 
Programmer by job description.  I view ALL of this from "How can I most quickly, with the least 
effort, make my job easier".  I don't have the luxury of learning anything that I don't HAVE to 
know, right now, to do my job.  I also don't have the luxury of spending weeks working on one 
specific problem.  Do a task for client A, move on to the task for client B.  At the end of the 
month, bill for my time.

Thanks for the hint to use SSIS.  That is one of those "haven't had the time to learn it" things but 
I have always known I probably should.  When you are up to your ass in alligators, it is hard to 
STOP... and study SSIS.

John W. Colby
www.ColbyConsulting.com

Robert Stewart wrote:
> John,
> 
> Mark has it exactly correct.
> 
> Use SQL Server to the max of what it can do for you before you go off
> creating something in c# to do essentially the same things. Data is what
> it does, C# is the GUI.  Very similar to what we have all done with MS
> Access over the years.
> 
> Now, lets get down to what you are doing...
> 
> Running code out of a database table is not efficient.  The optimizer has to
> work on it before it can be run. So, the processing will be slower.
> 
> It would be better to have the code one or more stored procedures with all the
> parameters and run them against small data sets to get them 
> optimized. Then they
> would be ready for the larger chunks of data.
> 
> Each SP can write to a log table and your C# GUI can display the progress by
> reading data out of the log tables.
> 
> By using the log tables, you can pass start times, records processed and the
> time at predetermined numbers of records and which stored proc is doing the
> work at the time, end times, and just about any other information you may want,
> Including things that cause errors.
> 
> SSIS is extremely powerful and the real tool that should probably be used for
> what you want to do.  It can do all of the logging, error trapping and handle
> passing errored records into a different table for analysis. The C# GUI could
> be used to read the output from the SSIS package to do the monitoring.  It is
> also relatively simple to sopy SSIS packages and change the parameters around
> on them to allow them to be used for different extractions and builds.
> 
> Robert