[AccessD] deduplication

Sun Jan 14 23:04:52 CST 2007

    Sample query as given below, would remove all duplicates (other than the first occurrence for each case).

    T_Data is the name of table, while F1, F2 & F3 are the names of 
fields, whose combined value determines whether a record is 
duplicate or not. ID is the primary key (number type).

A.D.Tejpal
---------------

=====================================
DELETE * FROM T_Data 
WHERE (SELECT Count(*) FROM T_Data As T1 
WHERE (T1.F1 & T1.F2 & T1.F3 = T_Data.F1 & T_Data.F2 & T_Data.F3) AND (T1.ID <= T_Data.ID)) > 1; 
=====================================

  ----- Original Message ----- 
  From: JWColby 
  To: 'Access Developers discussion and problem solving' ; dba-sqlserver at databaseadvisors.com 
  Sent: Monday, January 15, 2007 09:19
  Subject: [AccessD] deduplication

  Is there a SQL syntax or method for identifying duplicate records based on and automatically removing (or marking) the second and subsequent instances of that record.

  I am working with a database where there may be several records for the same person.  Each record will have slightly different data in the record.  I need to find all the records for a given person, remove or somehow flag the "duplicate" records, and eventually "merge" the duplicate records together.

  I know how to use groupby and count to find the dupes but I don't really know how to then merge fields F through M from records 2,3 and 4 back into fields F through M in record 1.  This is going to be an ongoing project so I need to learn the methodology to get it done.  I can do it with recordsets and code, but not with SQL, and given the large potential numbers of records SQL would be my first choice.

  John W. Colby
  Colby Consulting
  www.ColbyConsulting.com