jwcolby
jwcolby at colbyconsulting.com
Tue Apr 29 12:48:28 CDT 2008
> 1. So...you have a recordset in SQL server that you pull into a VB.NET app to process the records whose main purpose is to parse field values and then updates the same records to make those values into meaningful records that you can hand over to another process, right? Correct. I have a 96 million record table, 155 fields. Of that I pull the PKID, NameToParse (not called that obviously) and then 6 "parsed name pieces" fields, lname, fname etc (which I added to the table to hold the pieces). > 2. The VB.Net approach is due to the fact that you can't achieve the same fluid results using just SQL server. Not quite correct. The vb.Net results are due to the fact that I am using a 3rd party library which does the parsing of the name. Name parsing is non-trivial to say the least. That library is a .Net object (DLL actually) which you set up parameters in and then pass in a name to parse and back comes 5 to 8 "pieces". In fact fluid results are the least of my worries in SQL Server, ANY results are more like it. > 1. Do you actually display the data you grab for your datatable like in a grid? Or is it just a form with a button that triggers the underlying processes? In fact I do but that is not necessary. The name parser came with a demo form and I just modified that to display every 1000th record (which I pass to the form every 1000th record in the table), just so I could "watch it". The form is not a necessity and I may or may not get rid of it in a later iteration (probably will). I have a form (I created) which has a ServerName, DatabaseName, TableName (being parsed), PKFldName, ParseFldName, StartID, EndID, and button to trigger the parse. I fill in the pieces and push the button. Since I have control of the tables, I "hard code" the "parsed parts" field names and they will (at least for now) always be those names. I have to add those fields to the source table anyway. > 2. Do you grab x number of records, process them in VB.Net and then write the changes back to SQL server and then retrieve the next x number of records and so on? Correct. I grab X records into an ADO.Net DataTable, iterate that table row by row, passing the Unparsed name field off to the parser. I then write the resulting "Pieces" that the parser returns (properties of the parser object) into the matching fields in the ADO.Net DataTable. The name parser handles roughly 800 names / second and is the real bottle neck. I have a StartPK / EndPK which I then manipulate in a while loop, as I finish a "chunk" I write it to SQL Server, update the StartPK / EndPK and grab another chunk to process. I have to do this for 96 million names, however I also receive weekly updates to this table which I will also have to process in a similar manner before I append them to the table. The results of this will be used to send through Accuzip for Address validation / NCOA processing. Address validation can be done just on an address, but NCOA is change of address which is about a person at an address. Once that is done... there is more but you get the picture. And in fact there is another different table from the same data source that I will have to perform this same process on. It is highly likely that I will get other lists in the future with single name fields. The world is full of idiots. ;-) John W. Colby www.ColbyConsulting.com Eric Barro wrote: > Trying to understand the scenario... > > 1. So...you have a recordset in SQL server that you pull into a VB.NET app to process the records whose main purpose is to parse field values and then updates the same records to make those values into meaningful records that you can hand over to another process, right? > > 2. The VB.Net approach is due to the fact that you can't achieve the same fluid results using just SQL server. > > Questions: > > 1. Do you actually display the data you grab for your datatable like in a grid? Or is it just a form with a button that triggers the underlying processes? > > 2. Do you grab x number of records, process them in VB.Net and then write the changes back to SQL server and then retrieve the next x number of records and so on? > > > From: jwcolby <jwcolby at colbyconsulting.com> > Date: 2008/04/29 Tue AM 10:43:45 CDT > > > > Yes, that is correct. He is ending up loading the records into memory. > > Then passing individual updates back for each row of data that he > > changed. > > Yes, and most records have updates. > > > This method is not designed for doing large datasets like he is doing. > > > > It should be done on the SQL Server side. > > ROTFL. That is a definite NSS. > > Unfortunately I am not a DBA and with the responses I get from the SQL > group I am not even going there. I can do this purely in VB.Net (it is > cranking as we speak) so I will. > > > Just what are you updating...the UPDATE statement you would use to > > do what you are doing. > > No idea. I thought I could find out from the DA after the command > builder was finished building the update statement but I am getting an > error if I try to access the update property. > > John W. Colby > www.ColbyConsulting.com