[dba-SQLServer] Duplicate records

Mark A Matte markamatte at hotmail.com
Wed Mar 19 09:21:22 CDT 2008


Thanks for the suggestions.  
 
I know cursors are slow...but I'm not going through every record...just the dups.  Other than potential speed issues...anyone see anything that deserves caution here? I kinda like 'find them...then delete' approach
 
Thanks,
 
Mark A. Matte
 
 **********************BEGIN************************--By Mark A. Matte 3/18/2008declare @CaseID decimal(15,0), at NoteNum int, at NoteCount int declare MyCursor cursor for Select case_id,note_seq_nbr,count(*) as CT from tblCaseNote_temp group by case_id,note_seq_nbr having count(*)>1 order by case_id,note_seq_nbr open mycursor fetch next from mycursor into @CaseID, at NoteNum, at NoteCount while (@@fetch_status =0)
 Begin delete top(@noteCount-1) from tblCaseNote_Temp where case_id=@CaseID and note_seq_nbr=@noteNum fetch next from mycursor into @CaseID, at NoteNum, at NoteCount End Close mycursor deallocate mycursor ***********************END*************************> From: pauln at sqlserverbible.com> To: dba-sqlserver at databaseadvisors.com> Date: Tue, 18 Mar 2008 22:17:59 -0600> Subject: Re: [dba-SQLServer] Duplicate records> > > I'd do the BulkINsert into a staging table then eliminate the dups as part> of the cleanup.> > Here are two popular ways to eliminate dups...> > > Use Tempdb> > drop table dups> go> CREATE TABLE dbo.dups (> col1 INT,> col2 CHAR(5),> col3 CHAR(5)> )> go> INSERT dbo.dups (col1, col2, col3) -- 1> VALUES (1, 'abc', 'Paul')> INSERT dbo.dups (col1, col2, col3) -- 2> VALUES (1, 'abc', 'Paul') --dup> INSERT dbo.dups (col1, col2, col3) -- 3> VALUES (2, 'abc', 'Paul')> INSERT dbo.dups (col1, col2, col3) -- 4> VALUES (2, 'xyz', 'Sue')> INSERT dbo.dups (col1, col2, col3) -- 5> VALUES (2, 'xyz', 'Sue') -- dup> INSERT dbo.dups (col1, col2, col3) -- 6> VALUES (2, 'xyz', 'Sue') -- dup> INSERT dbo.dups (col1, col2, col3) -- 7> VALUES (3, 'xyz', 'Sue')> go> > -- eliminate dups - select distinct method> > SELECT DISTINCT *> INTO nodups> FROM dups > > SELECT * FROM nodups> > -- eliminate dups group by adding a pk> > ALTER TABLE dbo.dups> ADD pk INT NOT NULL IDENTITY PRIMARY KEY> > -- identify dups> SELECT * > FROM dbo.dups> WHERE PK NOT IN (> SELECT min(pk) > FROM dups> GROUP BY col1, col2, col3> )> > -- remove dups> DELETE dbo.dups> WHERE PK NOT IN (> SELECT min(pk) > FROM dups> GROUP BY col1, col2, col3> )> > SELECT * FROM dbo.dups> > > > > -----Original Message-----> From: dba-sqlserver-bounces at databaseadvisors.com> [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Mark A> Matte> Sent: Tuesday, March 18, 2008 4:33 PM> To: Discussion concerning MS SQL Server> Subject: Re: [dba-SQLServer] Duplicate records> > > I get the data in a 2+gig text file...how do import without dups...I tried> that first...and it just stopped the import?> > > > Date: Tue, 18 Mar 2008 14:42:07 -0700> > From: fhtapia at gmail.com> > To: dba-sqlserver at databaseadvisors.com> > Subject: Re: [dba-SQLServer] Duplicate records> >> > Cursors are slow, you could choose to place an index to prevent duplicates> > so duplicate information just never makes it into the table?> >> >> >> > On Tue, Mar 18, 2008 at 1:46 PM, Mark A Matte > > wrote:> >> >>> >> Hello All,> >>> >> I have a large table that occasionally duplicates make it into. The 2> >> fields that make the record unique are Case_ID and Note_Seq_Nbr. The> >> duplicates are identical...so I have no primary key. I used the code> below.> >> It doesn't run very fast...but it does the job of removing just the extra> >> records...> >>> >> Any thoughts,comments, or suggestions on making it faster/better...or a> >> totally different approach?> >>> >> Thanks,> >>> >> Mark A. Matte
_________________________________________________________________
Need to know the score, the latest news, or you need your Hotmail®-get your "fix".
http://www.msnmobilefix.com/Default.aspx


More information about the dba-SQLServer mailing list