[dba-VB] dba-VB Digest, Vol 54, Issue 28

Tue Apr 29 12:48:28 CDT 2008

 > 1. So...you have a recordset in SQL server that you pull into a 
VB.NET app to process the records whose main purpose is to parse field 
values and then updates the same records to make those values into 
meaningful records that you can hand over to another process, right?

Correct.  I have a 96 million record table, 155 fields.  Of that I pull 
the PKID, NameToParse (not called that obviously) and then 6 "parsed 
name pieces" fields, lname, fname etc (which I added to the table to 
hold the pieces).

 > 2. The VB.Net approach is due to the fact that you can't achieve the 
same fluid results using just SQL server.

Not quite correct.  The vb.Net results are due to the fact that I am 
using a 3rd party library which does the parsing of the name.  Name 
parsing is non-trivial to say the least.  That library is a .Net object 
(DLL actually) which you set up parameters in and then pass in a name to 
parse and back comes 5 to 8 "pieces".

In fact fluid results are the least of my worries in SQL Server, ANY 
results are more like it.

 > 1. Do you actually display the data you grab for your datatable like 
in a grid? Or is it just a form with a button that triggers the 
underlying processes?

In fact I do but that is not necessary.  The name parser came with a 
demo form and I just modified that to display every 1000th record (which 
I pass to the form every 1000th record in the table), just so I could 
"watch it".  The form is not a necessity and I may or may not get rid of 
it in a later iteration (probably will).

I have a form (I created) which has a ServerName, DatabaseName, 
TableName (being parsed), PKFldName, ParseFldName, StartID, EndID, and 
button to trigger the parse.  I fill in the pieces and push the button. 
  Since I have control of the tables, I "hard code" the "parsed parts" 
field names and they will (at least for now) always be those names.  I 
have to add those fields to the source table anyway.

 > 2. Do you grab x number of records, process them in VB.Net and then 
write the changes back to SQL server and then retrieve the next x number 
of records and so on?

Correct.  I grab X records into an ADO.Net DataTable, iterate that table 
row by row, passing the Unparsed name field off to the parser.  I then 
write the resulting "Pieces" that the parser returns (properties of the 
parser object) into the matching fields in the ADO.Net DataTable.  The 
name parser handles roughly 800 names / second and is the real bottle neck.

I have a StartPK / EndPK which I then manipulate in a while loop, as I 
finish a "chunk" I write it to SQL Server, update the StartPK / EndPK 
and grab another chunk to process.

I have to do this for 96 million names, however I also receive weekly 
updates to this table which I will also have to process in a similar 
manner before I append them to the table.

The results of this will be used to send through Accuzip for Address 
validation / NCOA processing.  Address validation can be done just on an 
address, but NCOA is change of address which is about a person at an 
address.

Once that is done... there is more but you get the picture.

And in fact there is another different table from the same data source 
that I will have to perform this same process on.  It is highly likely 
that I will get other lists in the future with single name fields.  The 
world is full of idiots.  ;-)

John W. Colby
www.ColbyConsulting.com

Eric Barro wrote:
> Trying to understand the scenario...
> 
> 1. So...you have a recordset in SQL server that you pull into a VB.NET app to process the records whose main purpose is to parse field values and then updates the same records to make those values into meaningful records that you can hand over to another process, right?
> 
> 2. The VB.Net approach is due to the fact that you can't achieve the same fluid results using just SQL server.
> 
> Questions:
> 
> 1. Do you actually display the data you grab for your datatable like in a grid? Or is it just a form with a button that triggers the underlying processes?
> 
> 2. Do you grab x number of records, process them in VB.Net and then write the changes back to SQL server and then retrieve the next x number of records and so on?
> 
> 
> From: jwcolby <jwcolby at colbyconsulting.com>
> Date: 2008/04/29 Tue AM 10:43:45 CDT
> 
> 
>  > Yes, that is correct. He is ending up loading the records into memory.
>  > Then passing individual updates back for each row of data that he
>  > changed.
> 
> Yes, and most records have updates.
> 
>  > This method is not designed for doing large datasets like he is doing.
>  >
>  > It should be done on the SQL Server side.
> 
> ROTFL.  That is a definite NSS.
> 
> Unfortunately I am not a DBA and with the responses I get from the SQL 
> group I am not even going there.  I can do this purely in VB.Net (it is 
> cranking as we speak) so I will.
> 
>  > Just what are you updating...the UPDATE statement you would use to
>  > do what you are doing.
> 
> No idea.  I thought I could find out from the DA after the command 
> builder was finished building the update statement but I am getting an 
> error if I try to access the update property.
> 
> John W. Colby
> www.ColbyConsulting.com