[AccessD] deduplication

Mon Jan 15 04:36:52 CST 2007

Hi John

I think you need to tighten this a bit ...

Deleting dupes is quite easy, so is to flag (update) the dupes, but - of course - the outcome would be quite different.
If the dupes are deleted, nothing is to be merged. Further, what do you mean by "merge"? Concatenate the field contents? Add more fields to the remaining single occurrence of every record? Or build a new parent table with a child table with a record for each dupe?

As far as I know, there is no general SQL method or syntax for any of these situations except for deleting dupes if you have a unique key in addition to the fields of interest containing the duplicated information.
I believe you easily could figure out how to accomplish what you wish in SQL. That said, I'm not so sure SQL will be fastest; if you can move through the recordset in a single run reading each record and write to another recordset as needed, that will be very fast, even with ADO. Wrap it in a transaction and it may run even faster.

/gustav

>>> jwcolby at colbyconsulting.com 15-01-2007 04:49:28 >>>
Is there a SQL syntax or method for identifying duplicate records based on
and automatically removing (or marking) the second and subsequent instances
of that record.

I am working with a database where there may be several records for the same
person.  Each record will have slightly different data in the record.  I
need to find all the records for a given person, remove or somehow flag the
"duplicate" records, and eventually "merge" the duplicate records together.

I know how to use groupby and count to find the dupes but I don't really
know how to then merge fields F through M from records 2,3 and 4 back into
fields F through M in record 1.  This is going to be an ongoing project so I
need to learn the methodology to get it done.  I can do it with recordsets
and code, but not with SQL, and given the large potential numbers of records
SQL would be my first choice.

John W. Colby
Colby Consulting
www.ColbyConsulting.com