[AccessD] Deleting Duplicates

Sat May 16 10:58:28 CDT 2009

Max:

The records are dups by definition of the user.  Not all the fields are the
same but by their criteria those are duplicate records.  The data comes in
the form of a spreadsheet from a mainframe system in a hospital.  My client
has no control over the data or the processes they use.  So we're in a very
similar situation with the data.

Rocky

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Max Wanadoo
Sent: Saturday, May 16, 2009 8:55 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] Deleting Duplicates

Hi Rocky,
Don't shoot me for this, but I don't think your records are dupes because if
they were they would not have been allowed in the table as a unique index
should be place on the criteria for determining what makes a dupe.  If you
want to avoid dupes on a live system, just set the unique index up and
Access will throw a wobbly when it encounters one - which you then handle in
code.

In my case the dupes occurred because the records were imported into a
holding table and the criteria had not yet been determined as to what
constitutes a duplicate record.

Max

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Rocky Smolin
Sent: 16 May 2009 16:00
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] Deleting Duplicates

Gustav:

True, the records are not dups in the table.  But they were duplicates to
from the user's side based on those few fields. Unfortunately, the magnitude
of the smaller of the price on the two records varied greatly enough that I
couldn't use that as a criterion for deletion. Would have made things so
much easier. :)

The client has an interesting business.  He audits the payables of hospitals
to look for overpayments, sales taxes paid that shouldn't have been, etc.
Then gets a percentage of the recovery.  Which can be huge or can be zip. 

The tables, which are payables details, purchase orders details, invoices
from vendors, etc. as you can imagine, can be huge.  There's all kinds of
business rules built into the screening functions to try to identify
overpayments. It was a very interesting app to put together.

We had discussed a long time ago moving to SQL but I said wait until the
calculation times are too long.  We never got there.  It impressed me with
just how powerful the JET engine is.

Rocky

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Gustav Brock
Sent: Saturday, May 16, 2009 7:37 AM
To: accessd at databaseadvisors.com
Subject: Re: [AccessD] Deleting Duplicates

Hi Rocky

This task is a bit different as the records are not dupes. It looks like
when rounding of an invoice line has taken place, a record with the rounding
error has been created to get the total of the invoice to match.

Couldn't you just have used a basic delete query where you deleted all
records with a fldInvoiceDetailPrice < $0.01? Or to handle credits as well:
Abs(fldInvoiceDetailPrice) < $0.01?

/gustav

>>> rockysmolin at bchacc.com 16-05-2009 16:12 >>>
Coincidentally I had a client call yesterday with this very problem.  He
send an mdb with the following message:

"Here is the file to be scrubbed of extraneous data.

The query shows the data so that the problem can be seen. If you go to
Invoice # 2294869, Product # 000472895, Line Number 11 you will see two
records, one with a fldInvoiceDetailPrice of $2.64 and a nearly duplicate
record with a price of $.0016. I would like to create a cleaned file without
the numerically smaller fldInvoiceDetailPrice of the two records. There are
roughly 4,000+ of these duplicates. Thanks in advance for your expertise! "

I tried to do it with queries, but in the end found it was easier to combine
a query with a bit of code. First, I added an autonumber field to the table
so that each record would have a unique identifier. 

The query was a summation query grouping by Invoice Number, Detail Line, and
Product number.  Then it had a count of Detail Line with a criterion of >1
so only the dups would be selected, and MIN on the unit price, so only the
lowest of the two would be selected.  At that point I couldn't figure out
how to make a delete query but this took only a couple of minutes to write
and deleted just the lower cost records as the client asked:

Dim db As DAO.Database
Dim rsDelete As DAO.Recordset
Dim rsQuery As DAO.Recordset

Set db = CurrentDb
Set rsQuery = db.OpenRecordset("qryFindTheDups1")
Set rsDelete = db.OpenRecordset("Select * FROM tblPPHPHSLawsonDataWork")

Do While rsQuery.EOF = False
    rsDelete.FindFirst "fldID = " & rsQuery!MinoffldID
    If rsDelete.NoMatch = True Then MsgBox "BARF!!!"
    rsDelete.Delete
    rsDelete.MoveFirst
    rsQuery.MoveNext
Loop

rsQuery.Close
Set rsQuery = Nothing
rsDelete.Close
Set rsDelete = Nothing
db.Close
Set db = Nothing

MsgBox "Done"

I charged him .75 hour for this and saved him hours of combing through the
42,000 records looking for the dups.

Rocky

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Susan Harkins
Sent: Saturday, May 16, 2009 6:53 AM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] Deleting Duplicates

Oh my! ;)

That makes sense though, I just didn't think it through. I had pancakes for
breakfast. :)

Susan H.

>
> It will delete the duplicates.  ALL of them!  And of course you 
> generally want to keep the FIRST record of a duplicate set.

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com