[dba-VB] Merge / Purge

Doug Steele dbdoug at gmail.com
Wed Oct 14 22:30:51 CDT 2009


On Wed, Oct 14, 2009 at 8:18 PM, jwcolby <jwcolby at colbyconsulting.com>wrote:

> Charlotte,
>
>  > Unless they have data in the same fields.  Then what happens?
>
> Well, so what?  Either overwrite or don't.  Remember it is impossible to
> tell which is correct.
> They can't BOTH be correct.  Flip a coin and do something, it literally
> doesn't matter what.
>
> John Colby Owns_A_Dog = 'Y'
> John Colby Owns_A_Dog = 'N'
>
> Which one is true?  We have no way to say one record is more recent so flip
> a coin and merge the data.
>
> In fact in most cases (90% of all the fields) the data is "Y" or nothing.
>  These records are the
> result of people taking polls.  So in 90% of the cases, the person will
> answer "Y" or they have no
> answer because they weren't asked that question in that poll.
>
> The objective here is to get a ton of individual poll responses merged into
> a single record.  I did
> a count earlier today and one person is in the database 36 times, 26 times,
> 17, 16, two 14s, four
> 13s etc.  3,080,470 "duplicate" person fact records (out of 50 million
> total records).
>
> So when I try to ask the question "income bracket in (A-T)" and ("Owns a
> dog" OR "owns a cat") AND
> "mail order buyer", I end up with cases where "Owns a dog" was answered in
> one record and "mail
> order buyer" was answered in another record.  So the AND fails whereas in
> fact the person answered
> affirmatively to both questions, just not in the same record.  This is
> nuts.  I need to merge things
> down so that there is a single record for a single person.  ALL ANSWERS
> need to be in one record.
> If there are conflicting records, flip a coin, pick one answer.
>
> Now before you go off on normalization and a single column fact table, just
> remember that I have 50
> million records X 550 columns.  A single column fact table would end up
> with 25 Tera records and I
> don't have a supercomputer.
>
> John W. Colby
> www.ColbyConsulting.com
>
>
> Charlotte Foust wrote:
> > Unless they have data in the same fields.  Then what happens?
> >
> > Charlotte Foust
> >
> > -----Original Message-----
> > From: dba-vb-bounces at databaseadvisors.com
> > [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
> > Sent: Wednesday, October 14, 2009 1:20 PM
> > To: stuart at lexacorp.com.pg; Discussion concerning Visual Basic and
> > related programming issues.
> > Subject: Re: [dba-VB] Merge / Purge
> >
> > Stuart,
> >
> > Yes there is a PKID.  The thing is, every record is equal so it simply
> > doesn't matter in which order the records are pulled.  Any of the
> > duplicates can be treated as the first record to be updated by whatever
> > is found in the subsequent records.
> >
> > It literally doesn't matter which record is encountered first.
> >
> > John W. Colby
> > www.ColbyConsulting.com
>
> _______________________________________________
> dba-VB mailing list
> dba-VB at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-vb
> http://www.databaseadvisors.com
>
>



More information about the dba-VB mailing list