[AccessD] deduplication

Mon May 28 14:29:06 CDT 2007

Hi John

You may get better results by using NOT IN:

DELETE * FROM tblData
WHERE PKID NOT IN (SELECT TOP 1 FROM tblData As T1 WHERE T1.HashPerson =
tblData.HashPerson AND T1.PKID <= tblData.PKID ORDER BY T1.PKID);

Air code. Test with caution. 

/gustav

>>> jwcolby at colbyconsulting.com 28-05-2007 21:00 >>>
A.D.

It seems to me that the subquery SELECT should be able to stand alone, and
the subquery is giving a "Incorrect syntax near '>'" 

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: accessd-bounces at databaseadvisors.com 
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of A.D.TEJPAL
Sent: Monday, May 28, 2007 1:29 PM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] deduplication

John,

    Considering the table & field names mentioned by you, the SQL given
below should get you the desired results. 

    You might like to try it out & confirm the outcome.

Best wishes,
A.D.Tejpal
---------------

=====================================
DELETE * FROM tblData
WHERE (SELECT Count(*) FROM tblData As T1 WHERE T1.HashPerson =
tblData.HashPerson AND T1.PKID <= tblData.PKID) > 1;
=====================================

  ----- Original Message -----
  From: jwcolby
  To: 'Access Developers discussion and problem solving' 
  Sent: Monday, May 28, 2007 20:36
  Subject: Re: [AccessD] deduplication

  A.D. (or anyone willing to chime in)

  I am attempting to implement your solution.  I created the following
select
  which should be the sub query in the outer WHERE:

  SELECT count(*)
  FROM tblData
  AS tblData1
  WHERE ((tblData1.HashPerson = tblData.HashPerson) AND (tblData1.PKID
  <=TblData.ID)) > 1; 

  However I am getting the very infamous Incorrect syntax near '>'.

  Any thoughts?

  John W. Colby
  Colby Consulting
  www.ColbyConsulting.com 
  -----Original Message-----
  From: accessd-bounces at databaseadvisors.com 
  [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of A.D.TEJPAL
  Sent: Monday, January 15, 2007 12:05 AM
  To: Access Developers discussion and problem solving
  Subject: Re: [AccessD] deduplication

      Sample query as given below, would remove all duplicates (other than
the first occurrence for each case).

      T_Data is the name of table, while F1, F2 & F3 are the names of
fields, whose combined value determines whether a record is duplicate or
not. ID is the primary key (number type).

  A.D.Tejpal
  ---------------

  =====================================
  DELETE * FROM T_Data
  WHERE (SELECT Count(*) FROM T_Data As T1 WHERE (T1.F1 & T1.F2 & T1.F3 =
  T_Data.F1 & T_Data.F2 & T_Data.F3) AND (T1.ID <= T_Data.ID)) > 1;
  =====================================

    ----- Original Message -----
    From: JWColby
    To: 'Access Developers discussion and problem solving' ;
  dba-sqlserver at databaseadvisors.com 
    Sent: Monday, January 15, 2007 09:19
    Subject: [AccessD] deduplication

    Is there a SQL syntax or method for identifying duplicate records based
on and automatically removing (or marking) the second and subsequent
instances of that record.

    I am working with a database where there may be several records for the
same person.  Each record will have slightly different data in the record. I
need to find all the records for a given person, remove or somehow flag the
"duplicate" records, and eventually "merge" the duplicate records together.

    I know how to use groupby and count to find the dupes but I don't really
know how to then merge fields F through M from records 2,3 and 4 back into
fields F through M in record 1.  This is going to be an ongoing project so I
need to learn the methodology to get it done.  I can do it with recordsets
and code, but not with SQL, and given the large potential numbers of records
SQL would be my first choice.

    John W. Colby
    Colby Consulting
    www.ColbyConsulting.com