[dba-VB] Merge rows

Max Wanadoo max.wanadoo at gmail.com
Mon Sep 21 17:04:56 CDT 2009


Are you saying that the FIRST JohnColby record you encounter is therefore
the MASTER and the rest JohnColby record's data gets to update the MASTER?

So, 
Rec 1. John Colby likes Fish (at the time he might have liked fish)
Rec 2. John Colby does not like fish (might have been asked some time later,
after a bad meal)

Which takes precedence?

Do you NOT overwrite existing data.
Or DO you ?

Max


-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: 21 September 2009 22:55
To: dwaters at usinternet.com; Discussion concerning Visual Basic and related
programming issues.
Subject: Re: [dba-VB] Merge rows

Dan,

In fact there are 50 million records in the table (though not that many
"duplicates") so there is no 
"making a decision".  Basically I have to take row one and update it with
row 2, 3, 4...  Delete row 
2,3,4...

This is basically a table of "polls" that people have taken, with ~600
fields where there may or may 
not be information about doing, wanting or having that thing.  Everything
from information about the 
boat you own to the electronics you own, to the medicine you take, whether
you own a dog or cat, 
hunt, fish etc.  In order to get in the database you have to have
volunteered the information 
somewhere along the line.  But you might have taken several different polls
at different times. 
Thus one record might have the electronics you own, another the medicine you
take, and you don't 
even own a boat so those fields contain nothing.

Like that.

I just want to take all the John Colby records and merge them into a single
John Colby record, 
deleting all the extras when I am done.

Of course this is one reason for my other question about getting rid of any
fields where there is no 
information in any record.  If I am going to process 50 million records, it
would be good to not 
have to process any fields where there is no info to merge.

The other reason for the other question is that doing a table scan against
50 million records with 
640 fields is just plain going to take longer than a table scan against 50
million records with only 
400 fields.  It comes down to a matter of disk sectors to load.  And with
640 fields you cannot have 
indexes against all of them, so you are going to be doing table scans.

I just did a test where I did a Count() where... on a non-indexed field and
it took 1:18.  So just 
doing these counts on all 640 fields could take awhile.  I will be counting
and logging the count in 
my data dictionary table.

And off we go...

John W. Colby
www.ColbyConsulting.com


Dan Waters wrote:
> This is a tough question - not technically difficult but what's tough is
> actually making the choice!  
> 
> I think you'll need to get some guidance from your customer as to which
> single record to keep (the latest one?), or should you put the contents of
> all the related cells into one cell and delete the other rows?  Or
something
> else?
> 
> This probably doesn't help . . .
> Dan

_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com




More information about the dba-VB mailing list