Jonathan Gold
johnnyclock at gmail.com
Mon May 28 15:05:57 CDT 2007
This can't be it, but I notice that in your statement you have a capital 'T' (<=TblData.ID) where I expect a lower-case one: SELECT count(*) FROM tblData AS tblData1 WHERE ((tblData1.HashPerson = tblData.HashPerson) AND (tblData1.PKID <=TblData.ID)) > 1; On 5/28/07, jwcolby <jwcolby at colbyconsulting.com> wrote: > > I created HashID fields which use the SHA1 hash function built in to SQL > Server 2007. That gives me one single field to index and pull HashIDs > from > which uniquely identifies duplicates using count() grouping > queries. These > hash fields work quite well, though they are unreadable. The nice part is > that they reduce multiple field strings - Addr, zip5, zip4 and even Lname > & > fName which are potentially 60 or 70 (or more) bytes long down to a string > only about 20 bytes long. I then index those. > > I am building a query that returns HashIDs where count(HashID) > 0. Using > that I pull all the PKIDs for those HashIDs. Using that I pull Max(PKID). > You get the picture. Even starting with the HashID fields it is a > multi-step process, kinda the 'brute force' approach, and I was hoping to > come up with something simpler. > > So far A.D.s suggestion has refused to compile, regardless of how I muck > around with it. > > I got a delete method working, but it does require running the delete > query > until no records delete. There are cases where I have 4,5, or even more > duplicates. > > Once I figure out what I am doing, I hope to create a stored procedure > where > I can input the table name and just have the process run against that > table. > > John W. Colby > Colby Consulting > www.ColbyConsulting.com > -----Original Message----- > From: accessd-bounces at databaseadvisors.com > [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Gustav Brock > Sent: Monday, May 28, 2007 3:29 PM > To: accessd at databaseadvisors.com > Subject: Re: [AccessD] deduplication > > Hi John > > You may get better results by using NOT IN: > > DELETE * FROM tblData > WHERE PKID NOT IN (SELECT TOP 1 FROM tblData As T1 WHERE T1.HashPerson = > tblData.HashPerson AND T1.PKID <= tblData.PKID ORDER BY T1.PKID); > > Air code. Test with caution. > > /gustav > > > >>> jwcolby at colbyconsulting.com 28-05-2007 21:00 >>> > A.D. > > It seems to me that the subquery SELECT should be able to stand alone, and > the subquery is giving a "Incorrect syntax near '>'" > > John W. Colby > Colby Consulting > www.ColbyConsulting.com > -----Original Message----- > From: accessd-bounces at databaseadvisors.com > [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of A.D.TEJPAL > Sent: Monday, May 28, 2007 1:29 PM > To: Access Developers discussion and problem solving > Subject: Re: [AccessD] deduplication > > John, > > Considering the table & field names mentioned by you, the SQL given > below should get you the desired results. > > You might like to try it out & confirm the outcome. > > Best wishes, > A.D.Tejpal > --------------- > > ===================================== > DELETE * FROM tblData > WHERE (SELECT Count(*) FROM tblData As T1 WHERE T1.HashPerson = > tblData.HashPerson AND T1.PKID <= tblData.PKID) > 1; > ===================================== > > ----- Original Message ----- > From: jwcolby > To: 'Access Developers discussion and problem solving' > Sent: Monday, May 28, 2007 20:36 > Subject: Re: [AccessD] deduplication > > > A.D. (or anyone willing to chime in) > > I am attempting to implement your solution. I created the following > select > which should be the sub query in the outer WHERE: > > SELECT count(*) > FROM tblData > AS tblData1 > WHERE ((tblData1.HashPerson = tblData.HashPerson) AND (tblData1.PKID > <=TblData.ID)) > 1; > > However I am getting the very infamous Incorrect syntax near '>'. > > Any thoughts? > > John W. Colby > Colby Consulting > www.ColbyConsulting.com > -----Original Message----- > From: accessd-bounces at databaseadvisors.com > [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of A.D.TEJPAL > Sent: Monday, January 15, 2007 12:05 AM > To: Access Developers discussion and problem solving > Subject: Re: [AccessD] deduplication > > Sample query as given below, would remove all duplicates (other than > the first occurrence for each case). > > T_Data is the name of table, while F1, F2 & F3 are the names of > fields, whose combined value determines whether a record is duplicate or > not. ID is the primary key (number type). > > A.D.Tejpal > --------------- > > ===================================== > DELETE * FROM T_Data > WHERE (SELECT Count(*) FROM T_Data As T1 WHERE (T1.F1 & T1.F2 & T1.F3 = > T_Data.F1 & T_Data.F2 & T_Data.F3) AND (T1.ID <= T_Data.ID)) > 1; > ===================================== > > ----- Original Message ----- > From: JWColby > To: 'Access Developers discussion and problem solving' ; > dba-sqlserver at databaseadvisors.com > Sent: Monday, January 15, 2007 09:19 > Subject: [AccessD] deduplication > > Is there a SQL syntax or method for identifying duplicate records > based > on and automatically removing (or marking) the second and subsequent > instances of that record. > > I am working with a database where there may be several records for > the > same person. Each record will have slightly different data in the record. > I > need to find all the records for a given person, remove or somehow flag > the > "duplicate" records, and eventually "merge" the duplicate records > together. > > I know how to use groupby and count to find the dupes but I don't > really > know how to then merge fields F through M from records 2,3 and 4 back into > fields F through M in record 1. This is going to be an ongoing project so > I > need to learn the methodology to get it done. I can do it with recordsets > and code, but not with SQL, and given the large potential numbers of > records > SQL would be my first choice. > > John W. Colby > Colby Consulting > www.ColbyConsulting.com > > > -- > AccessD mailing list > AccessD at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/accessd > Website: http://www.databaseadvisors.com > > -- > AccessD mailing list > AccessD at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/accessd > Website: http://www.databaseadvisors.com > -- Jonathan Berkeley, CA http://home.pacbell.net/jonnygee/