[dba-VB] What to do, what to do?

jwcolby jwcolby at colbyconsulting.com
Fri Nov 13 08:30:42 CST 2009


Mark,

 > Just curious, how many records are you talking about - a few hundred or millions?

Millions, however I am talking about updating sets of records, from 2 to < 30 rows.  Imagine a table 
where Mark Breen has 22 records in the table, John Colby has 17 records in the table, Shamil has 3 
records etc.  Each of Mark's records have data in a "random" handful of 525 fields.  The objective 
is to get all of Mark's data into one record.  Once Mark's records are processed, move on to John's 
records.  Then process William's records.

There are 6,514,623 records to be processed.  These 6.5 million records represent perhaps about 3 
million actual people.  I am creating tables right now to store the hash field and a count of 
records that hash field represents.

As an example I have 9 actual people with more than 10 records to be merged, 4,725 people with 5 
records, 270,0839 people with 3 records to merge, and 2,771,885 people with two records to merge.

The nice part is that the table I am discussing is not constantly changing, in fact this data is 
entirely static.  And there is no time frame for getting it done.  I can start the process running 
and if it takes all night or all week I don't care.  I suspect that with efficient programming I can 
do this at hundreds of records per second.  I intend to do it out in tables dedicated to this 
process in a database dedicated to this process.  All of the updates between fields in different 
records will be taking place entirely in memory.  I will probably pull in all of the records for a 
thousand people, do the merges, then update those records back to SQL Server and pull another chunk 
of a thousand records.

 > Second question, other than for purely academic reasons, why are you bring the data into C#

Mostly because I know programming, and TSQL is far enough from my expertise as to be unusable in the 
immediate future.  To be honest I am not sure it is even possible to do this easily in TSQL.  That 
said I am sure some TSQL GURU will pop up with the answer.  But I am not a TSQL Guru (nor a Guru of 
any type for that matter).  But the thought of programming this kind of algorithm in TSQL sends 
shivers down my spine.

I need to do the following:

R1F1	F1F2	R1F3...	R1F525
R2F1	R2F2	R2F3...	R2F525
R3F1	R3F2	R3F3...	R3F525

The lines above represent Rows R1, R2 and R3, Fields F1,F2,F3...F525

Start with R1 and R2
	Start at column 1
	If R1F1 is null AND R2F1 is NOT NULL THEN
		Search through sequential records, same field until data found
	endif
	Move to column 2, continue to all columns
Move to R1 and R3

In English:

There is a set of records for a given person.
If R1Col(N) has data, do not update.
If R1Col(N) has no data, get data from R2Col(N)
If R2Col(N) has no data, get data from R3Col(N)
Start at N = (column) 1.
Search until data found, update R1Col(N), then abort search and move to next column

C# has a wonderful representation of data inside of the data adapter.  Each table is a collection of 
rows AND a collection of columns.  So I can iterate through the collection of columns, and once I 
have a collection of fields in a single column, iterate through that looking at the data in row 1, 
row 2 row 3 until data is found.

This data representation just immediately does 1/2 the work for me by presenting me with my data 
already loaded into these collections.

 > Of course if it is to learn datasets then thats another story

It is to get work done that needs doing.  Along the way I expect to learn datasets and get that much 
more comfortable with C#.

Doesn't this sound like FUN programming?  I am a programmer at heart, I love doing this kind of 
stuff, and C# is where I am going, so why not do it there?

John W. Colby
www.ColbyConsulting.com


Mark Breen wrote:
> Hello John,
> 
> Just curious, how many records are you talking about - a few hundred or
> millions?
> 
> Second question, other than for purely academic reasons, why are you bring
> the data into C#, to concatenate and insert into another table can so easily
> be done in one sproc, even if you need a temporary holding place, you would
> still use a temp table or even a local table.
> 
> Of course if it is to learn datasets then thats another story,
> 
> Mark



More information about the dba-VB mailing list