jwcolby
jwcolby at colbyconsulting.com
Thu Nov 5 07:52:53 CST 2009
Mark, > However, I think that Robert is correct in suggestion to try, as much as possible to not cross the technology stack with volumes of data when you do not have to. I have never had a problem with letting each tool do what it does best. In my case (or for this specific client) the database is decidedly not "normal". I have many different databases, each with a few tables, and ZERO (yes ZERO) parent / child tables. I keep tables of names / addresses. That's it. So I am not doing transactions, I am not doing reports, I am not doing data entry, I am not doing any of the "normal" things that people do with databases. What I have to do is get the entire table of name / address out to an external process and then back in to the same database, and I have to do this at least monthly. We are talking 20, 50, 80 million names exported into 2 million record files, processed through this external program, and then imported back in again. This is a "huge" process, not in terms of sheer SQL complexity but rather in terms of the steps involved, the monitoring that has to occur, and the time it takes. And I have to do this on a half dozen large databases, monthly. The external program processes about 2 million records an hour start to finish. Do the math and 60 million records turns into 30 hours of external processing. I use three virtual machines to get three times the records / hour but you are still talking 10 hours and that doesn't count the time to get the records out and back in to SQL Server again. My thought was to start a discussion about that "periphery processing" so that I could get ideas from people who already do it. I am getting almost no response however with actual "I do that using..." responses, and I am wondering why. Perhaps because I deal with small clients, the cost of manual labor to do those kinds of things by hand is too high so they hire me to automate it. Perhaps in a big organization they simply throw people at the problem? Believe me, when you get down to a "one person company" (me!) you can't do this stuff manually for very long, you MUST automate it or you will spend you entire day doing stuff that should be done by the computer. My original question was aimed at discovering how other people handle this kind of processing. I have another client (in Access in this case) that routinely exports data using very specialized export functions (that I wrote) dictated by mainframe interface specs. The data has to be pulled, exported, formatted, sometimes encrypted, sometimes zipped, sometimes FTPd, sometimes emailed, (the "sometimes" part depending on the recipient's preferences), and emails sent announcing the data is on the way etc. I do all of this in Access currently but it is ugly in Access because VBA does not have the built-in tools to do ftp/zip/encrypt etc. C# has some of that built-in and you can usually find C# routines to do the rest. BTW, as I have mentioned, I am taking a C# class at the local community college. I met a Russian gentleman, 60ish, who has been in programming / IT all of his life, recently unemployed. He and I are the only two in the class that have actually worked in the industry (other than the teacher). We struck up a conversation and he asked me if I would like to bid with him on a couple of small C# jobs on a jobs board "just to get some real world experience". I counter-offered that he join me in automating some of my real-world tasks. He came over yesterday and we are going to be working together on the kinds of tasks I described above. He has a LOT of C# experience but almost no database experience beyond simple select queries. We should each learn a lot from the other, it should be fun, and hopefully I can get some of this automation done! John W. Colby www.ColbyConsulting.com Mark Breen wrote: > Hello John, > > FWIW, I agree with you that you need a rich environment to do the work that > you plan to do. > > However, I think that Robert is correct in suggestion to try, as much as > possible to not cross the technology stack with volumes of data when you do > not have to. > > IOW, do all the heavy lifting in the database, and by all means return as > many parameters or log results back to the front end you are building in C#, > but try to avoid returning your data when ever possible. > > I guess I was suggesting the same with with SSIS. Where you have to export > data based on decisions, SSIS is an extremely fast option. > > Have you already used EXECUTE command to call an sproc within an sproc? > This is a useful means of creating an sproc, that running a set of sprocs, > and can insert / update a log table as it progresses. For data only > actvity, this might be effective. > > For all the front end work however, I guess you will make C# sing, like only > you can, so work away, and let TSQL do what it is good at, all the rest > should not involve enormous volumes of data, so the technology bridge will > be irrelevant. > > Good luck, you know you are lucky. Most Database people never get to read > about a db as big as what you have :) > > On last thing, in SQL Server, Cursors are a dirty word, I never see them > used. However, when you have low volumes of iteration (thousands or a few > tens of thousands), cursors are a very powerful way of enumerating tasks. > > Eg, lets imagine you had a table of states, and you needed to perform ten > seperate sprocs for each state. You could use a cursor to enumerate the > states table and pass in the state name to the ten different sprocs. > > You could log between each iteration, and you could have the whole thing > running in 30 minutes. In this sort of context, I agree with Robert, that > TSQL can give you Raw power quickly and easily. > > C# could be watching your log table and your 'Status Update' table and > simply returning 1 or 2 records from the db to C# per minute, whereas within > your sproc that is enumerating it might be handling millions of records each > two minutes. Those millions would never cross the stack divide of DB Server > to .Net. Again this sharing of processing power is probably what Robert was > referring to. > > BTW I recently started using FileZilla and find it super. > > Good Luck, > > Mark > > > > Mark > _______________________________________________ > dba-VB mailing list > dba-VB at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-vb > http://www.databaseadvisors.com > >