jwcolby
jwcolby at colbyconsulting.com
Fri Nov 13 08:30:42 CST 2009
Mark, > Just curious, how many records are you talking about - a few hundred or millions? Millions, however I am talking about updating sets of records, from 2 to < 30 rows. Imagine a table where Mark Breen has 22 records in the table, John Colby has 17 records in the table, Shamil has 3 records etc. Each of Mark's records have data in a "random" handful of 525 fields. The objective is to get all of Mark's data into one record. Once Mark's records are processed, move on to John's records. Then process William's records. There are 6,514,623 records to be processed. These 6.5 million records represent perhaps about 3 million actual people. I am creating tables right now to store the hash field and a count of records that hash field represents. As an example I have 9 actual people with more than 10 records to be merged, 4,725 people with 5 records, 270,0839 people with 3 records to merge, and 2,771,885 people with two records to merge. The nice part is that the table I am discussing is not constantly changing, in fact this data is entirely static. And there is no time frame for getting it done. I can start the process running and if it takes all night or all week I don't care. I suspect that with efficient programming I can do this at hundreds of records per second. I intend to do it out in tables dedicated to this process in a database dedicated to this process. All of the updates between fields in different records will be taking place entirely in memory. I will probably pull in all of the records for a thousand people, do the merges, then update those records back to SQL Server and pull another chunk of a thousand records. > Second question, other than for purely academic reasons, why are you bring the data into C# Mostly because I know programming, and TSQL is far enough from my expertise as to be unusable in the immediate future. To be honest I am not sure it is even possible to do this easily in TSQL. That said I am sure some TSQL GURU will pop up with the answer. But I am not a TSQL Guru (nor a Guru of any type for that matter). But the thought of programming this kind of algorithm in TSQL sends shivers down my spine. I need to do the following: R1F1 F1F2 R1F3... R1F525 R2F1 R2F2 R2F3... R2F525 R3F1 R3F2 R3F3... R3F525 The lines above represent Rows R1, R2 and R3, Fields F1,F2,F3...F525 Start with R1 and R2 Start at column 1 If R1F1 is null AND R2F1 is NOT NULL THEN Search through sequential records, same field until data found endif Move to column 2, continue to all columns Move to R1 and R3 In English: There is a set of records for a given person. If R1Col(N) has data, do not update. If R1Col(N) has no data, get data from R2Col(N) If R2Col(N) has no data, get data from R3Col(N) Start at N = (column) 1. Search until data found, update R1Col(N), then abort search and move to next column C# has a wonderful representation of data inside of the data adapter. Each table is a collection of rows AND a collection of columns. So I can iterate through the collection of columns, and once I have a collection of fields in a single column, iterate through that looking at the data in row 1, row 2 row 3 until data is found. This data representation just immediately does 1/2 the work for me by presenting me with my data already loaded into these collections. > Of course if it is to learn datasets then thats another story It is to get work done that needs doing. Along the way I expect to learn datasets and get that much more comfortable with C#. Doesn't this sound like FUN programming? I am a programmer at heart, I love doing this kind of stuff, and C# is where I am going, so why not do it there? John W. Colby www.ColbyConsulting.com Mark Breen wrote: > Hello John, > > Just curious, how many records are you talking about - a few hundred or > millions? > > Second question, other than for purely academic reasons, why are you bring > the data into C#, to concatenate and insert into another table can so easily > be done in one sproc, even if you need a temporary holding place, you would > still use a temp table or even a local table. > > Of course if it is to learn datasets then thats another story, > > Mark