[dba-VB] Merge rows

jwcolby jwcolby at colbyconsulting.com
Mon Sep 21 17:35:51 CDT 2009


Well, the issue is that the client wants to do things like:

Give me 5 million records where the income is in this range, has kids in this age range, has a dog 
and exercises.

Now...

You have two john colby records.  The "exercises" info is in the first record, and the "has a dog" 
is in the second record.  You just dropped John Colby from the selection because the AND failed.  It 
would have succeeded had all the data been in one record.

Essentially this table IS a Polls table.  Someone just consolidated a ton of poll info into 640 
fields a long time ago.

John W. Colby
www.ColbyConsulting.com


Charlotte Foust wrote:
> Merging them into one flat John Colby record is truly ugly, John.  Why
> not separate them into a polls table (unless, of course, that's where
> you have them already) and use the John Colby record PK to connect them?
> 
> Charlotte Foust 
> 
> -----Original Message-----
> From: dba-vb-bounces at databaseadvisors.com
> [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
> Sent: Monday, September 21, 2009 2:55 PM
> To: dwaters at usinternet.com; Discussion concerning Visual Basic and
> related programming issues.
> Subject: Re: [dba-VB] Merge rows
> 
> Dan,
> 
> In fact there are 50 million records in the table (though not that many
> "duplicates") so there is no "making a decision".  Basically I have to
> take row one and update it with row 2, 3, 4...  Delete row 2,3,4...
> 
> This is basically a table of "polls" that people have taken, with ~600
> fields where there may or may not be information about doing, wanting or
> having that thing.  Everything from information about the boat you own
> to the electronics you own, to the medicine you take, whether you own a
> dog or cat, hunt, fish etc.  In order to get in the database you have to
> have volunteered the information somewhere along the line.  But you
> might have taken several different polls at different times. 
> Thus one record might have the electronics you own, another the medicine
> you take, and you don't even own a boat so those fields contain nothing.
> 
> Like that.
> 
> I just want to take all the John Colby records and merge them into a
> single John Colby record, deleting all the extras when I am done.
> 
> Of course this is one reason for my other question about getting rid of
> any fields where there is no information in any record.  If I am going
> to process 50 million records, it would be good to not have to process
> any fields where there is no info to merge.
> 
> The other reason for the other question is that doing a table scan
> against 50 million records with 640 fields is just plain going to take
> longer than a table scan against 50 million records with only 400
> fields.  It comes down to a matter of disk sectors to load.  And with
> 640 fields you cannot have indexes against all of them, so you are going
> to be doing table scans.
> 
> I just did a test where I did a Count() where... on a non-indexed field
> and it took 1:18.  So just doing these counts on all 640 fields could
> take awhile.  I will be counting and logging the count in my data
> dictionary table.
> 
> And off we go...
> 
> John W. Colby
> www.ColbyConsulting.com
> 
> _______________________________________________
> dba-VB mailing list
> dba-VB at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-vb
> http://www.databaseadvisors.com
> 
> 



More information about the dba-VB mailing list