[dba-VB] Merge rows

Shamil Salakhetdinov shamil at smsconsulting.spb.ru
Tue Sep 22 01:59:47 CDT 2009


Hi John,

3.086 million records seems to be manageable to be "deduped" in one go
provided the source 50 million records  table is already prepared.

For "deduping" you can use 

1. One stored procedure which:
   a) will insert new individual attributes' record and
   b) will update not null values for existing individual attributes'
record.

2. One C#/VB.NET function to get records from the source 50 million records'
table and to "feed" the above stored procedure.

That's a rough plan.

Of course developing SP with 640 parameters and C#/VB.NET code working with
that SP is not an easy manual work - that SP and code could be generated.

Not sure yet here that SP can have 640 parameters therefore it may be needed
to have several SPs.

To get source rows a T-SQL select expression with ROW_NUMBER() function
should be used to have relatively small subsets of record batches to
process. This selection can be done using multi-threaded C#/VB.Net code . Do
you have any index over source 50 million rows table? - ROW_NUMBER()
function will need such an index (I'm talking MS SQL 2005 here, maybe in MS
SQL 2008 ROW_NUMBER() can be used without any indexes)...

When source data deduped into ~3.086 million records table then that "small"
table can get a few the most selective columns' indexes, and probably one
compound natural key clustered index built using the most selective (/the
most often used in queries) columns. If there are a few such columns at
least one of which is used in all the queries then your application response
time should be finally measured in seconds... I hope...

That's basically it.

Please correct me if you suppose that the above plan can't be fulfilled
because of some reasons I'm missing here.

Thank you.


--
Shamil  
 

-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Tuesday, September 22, 2009 5:19 AM
To: Discussion concerning Visual Basic and related programming issues.
Subject: Re: [dba-VB] Merge rows

Not 50 million, though still a lot.  I just ran a query which shows 3.086
million individduals with 
multiple records.

John W. Colby
www.ColbyConsulting.com


Max Wanadoo wrote:
> 50 Million!
> 
> Max
> 
> 
> -----Original Message-----
> From: dba-vb-bounces at databaseadvisors.com
> [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Stuart McLachlan
> Sent: 21 September 2009 23:55
> To: Discussion concerning Visual Basic and related programming issues.
> Subject: Re: [dba-VB] Merge rows
> 
> On the assumption that you don't really care which record you get if there
> are conflicts, I'd 
> build a new table from:
> 
> Select Distinct, KeyField, Max(fld2),Max(fld3)..........
> 

 

__________ Information from ESET NOD32 Antivirus, version of virus signature
database 4445 (20090921) __________

The message was checked by ESET NOD32 Antivirus.

http://www.esetnod32.ru
 




More information about the dba-VB mailing list