Rocky Smolin
rockysmolin at bchacc.com
Sat May 16 10:58:28 CDT 2009
Max: The records are dups by definition of the user. Not all the fields are the same but by their criteria those are duplicate records. The data comes in the form of a spreadsheet from a mainframe system in a hospital. My client has no control over the data or the processes they use. So we're in a very similar situation with the data. Rocky -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Max Wanadoo Sent: Saturday, May 16, 2009 8:55 AM To: 'Access Developers discussion and problem solving' Subject: Re: [AccessD] Deleting Duplicates Hi Rocky, Don't shoot me for this, but I don't think your records are dupes because if they were they would not have been allowed in the table as a unique index should be place on the criteria for determining what makes a dupe. If you want to avoid dupes on a live system, just set the unique index up and Access will throw a wobbly when it encounters one - which you then handle in code. In my case the dupes occurred because the records were imported into a holding table and the criteria had not yet been determined as to what constitutes a duplicate record. Max -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Rocky Smolin Sent: 16 May 2009 16:00 To: 'Access Developers discussion and problem solving' Subject: Re: [AccessD] Deleting Duplicates Gustav: True, the records are not dups in the table. But they were duplicates to from the user's side based on those few fields. Unfortunately, the magnitude of the smaller of the price on the two records varied greatly enough that I couldn't use that as a criterion for deletion. Would have made things so much easier. :) The client has an interesting business. He audits the payables of hospitals to look for overpayments, sales taxes paid that shouldn't have been, etc. Then gets a percentage of the recovery. Which can be huge or can be zip. The tables, which are payables details, purchase orders details, invoices from vendors, etc. as you can imagine, can be huge. There's all kinds of business rules built into the screening functions to try to identify overpayments. It was a very interesting app to put together. We had discussed a long time ago moving to SQL but I said wait until the calculation times are too long. We never got there. It impressed me with just how powerful the JET engine is. Rocky -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Gustav Brock Sent: Saturday, May 16, 2009 7:37 AM To: accessd at databaseadvisors.com Subject: Re: [AccessD] Deleting Duplicates Hi Rocky This task is a bit different as the records are not dupes. It looks like when rounding of an invoice line has taken place, a record with the rounding error has been created to get the total of the invoice to match. Couldn't you just have used a basic delete query where you deleted all records with a fldInvoiceDetailPrice < $0.01? Or to handle credits as well: Abs(fldInvoiceDetailPrice) < $0.01? /gustav >>> rockysmolin at bchacc.com 16-05-2009 16:12 >>> Coincidentally I had a client call yesterday with this very problem. He send an mdb with the following message: "Here is the file to be scrubbed of extraneous data. The query shows the data so that the problem can be seen. If you go to Invoice # 2294869, Product # 000472895, Line Number 11 you will see two records, one with a fldInvoiceDetailPrice of $2.64 and a nearly duplicate record with a price of $.0016. I would like to create a cleaned file without the numerically smaller fldInvoiceDetailPrice of the two records. There are roughly 4,000+ of these duplicates. Thanks in advance for your expertise! " I tried to do it with queries, but in the end found it was easier to combine a query with a bit of code. First, I added an autonumber field to the table so that each record would have a unique identifier. The query was a summation query grouping by Invoice Number, Detail Line, and Product number. Then it had a count of Detail Line with a criterion of >1 so only the dups would be selected, and MIN on the unit price, so only the lowest of the two would be selected. At that point I couldn't figure out how to make a delete query but this took only a couple of minutes to write and deleted just the lower cost records as the client asked: Dim db As DAO.Database Dim rsDelete As DAO.Recordset Dim rsQuery As DAO.Recordset Set db = CurrentDb Set rsQuery = db.OpenRecordset("qryFindTheDups1") Set rsDelete = db.OpenRecordset("Select * FROM tblPPHPHSLawsonDataWork") Do While rsQuery.EOF = False rsDelete.FindFirst "fldID = " & rsQuery!MinoffldID If rsDelete.NoMatch = True Then MsgBox "BARF!!!" rsDelete.Delete rsDelete.MoveFirst rsQuery.MoveNext Loop rsQuery.Close Set rsQuery = Nothing rsDelete.Close Set rsDelete = Nothing db.Close Set db = Nothing MsgBox "Done" I charged him .75 hour for this and saved him hours of combing through the 42,000 records looking for the dups. Rocky -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Susan Harkins Sent: Saturday, May 16, 2009 6:53 AM To: Access Developers discussion and problem solving Subject: Re: [AccessD] Deleting Duplicates Oh my! ;) That makes sense though, I just didn't think it through. I had pancakes for breakfast. :) Susan H. > > It will delete the duplicates. ALL of them! And of course you > generally want to keep the FIRST record of a duplicate set. -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com