Doug Steele
dbdoug at gmail.com
Wed Oct 14 22:30:51 CDT 2009
On Wed, Oct 14, 2009 at 8:18 PM, jwcolby <jwcolby at colbyconsulting.com>wrote: > Charlotte, > > > Unless they have data in the same fields. Then what happens? > > Well, so what? Either overwrite or don't. Remember it is impossible to > tell which is correct. > They can't BOTH be correct. Flip a coin and do something, it literally > doesn't matter what. > > John Colby Owns_A_Dog = 'Y' > John Colby Owns_A_Dog = 'N' > > Which one is true? We have no way to say one record is more recent so flip > a coin and merge the data. > > In fact in most cases (90% of all the fields) the data is "Y" or nothing. > These records are the > result of people taking polls. So in 90% of the cases, the person will > answer "Y" or they have no > answer because they weren't asked that question in that poll. > > The objective here is to get a ton of individual poll responses merged into > a single record. I did > a count earlier today and one person is in the database 36 times, 26 times, > 17, 16, two 14s, four > 13s etc. 3,080,470 "duplicate" person fact records (out of 50 million > total records). > > So when I try to ask the question "income bracket in (A-T)" and ("Owns a > dog" OR "owns a cat") AND > "mail order buyer", I end up with cases where "Owns a dog" was answered in > one record and "mail > order buyer" was answered in another record. So the AND fails whereas in > fact the person answered > affirmatively to both questions, just not in the same record. This is > nuts. I need to merge things > down so that there is a single record for a single person. ALL ANSWERS > need to be in one record. > If there are conflicting records, flip a coin, pick one answer. > > Now before you go off on normalization and a single column fact table, just > remember that I have 50 > million records X 550 columns. A single column fact table would end up > with 25 Tera records and I > don't have a supercomputer. > > John W. Colby > www.ColbyConsulting.com > > > Charlotte Foust wrote: > > Unless they have data in the same fields. Then what happens? > > > > Charlotte Foust > > > > -----Original Message----- > > From: dba-vb-bounces at databaseadvisors.com > > [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby > > Sent: Wednesday, October 14, 2009 1:20 PM > > To: stuart at lexacorp.com.pg; Discussion concerning Visual Basic and > > related programming issues. > > Subject: Re: [dba-VB] Merge / Purge > > > > Stuart, > > > > Yes there is a PKID. The thing is, every record is equal so it simply > > doesn't matter in which order the records are pulled. Any of the > > duplicates can be treated as the first record to be updated by whatever > > is found in the subsequent records. > > > > It literally doesn't matter which record is encountered first. > > > > John W. Colby > > www.ColbyConsulting.com > > _______________________________________________ > dba-VB mailing list > dba-VB at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-vb > http://www.databaseadvisors.com > >