[dba-SQLServer] Duplicate records

Paul Nielsen pauln at sqlserverbible.com
Tue Mar 18 23:17:59 CDT 2008


I'd do the BulkINsert into a staging table then eliminate the dups as part
of the cleanup.

Here are two popular ways to eliminate dups...


Use Tempdb

drop table dups
go
CREATE TABLE dbo.dups (
  col1 INT,
  col2 CHAR(5),
  col3 CHAR(5)
  )
go
INSERT dbo.dups (col1, col2, col3) -- 1
  VALUES (1, 'abc', 'Paul')
INSERT dbo.dups (col1, col2, col3) -- 2
  VALUES (1, 'abc', 'Paul') --dup
INSERT dbo.dups (col1, col2, col3) -- 3
  VALUES (2, 'abc', 'Paul')
INSERT dbo.dups (col1, col2, col3) -- 4
  VALUES (2, 'xyz', 'Sue')
INSERT dbo.dups (col1, col2, col3) -- 5
  VALUES (2, 'xyz', 'Sue')  -- dup
INSERT dbo.dups (col1, col2, col3) -- 6
  VALUES (2, 'xyz', 'Sue')  -- dup
INSERT dbo.dups (col1, col2, col3) -- 7
  VALUES (3, 'xyz', 'Sue')
go

-- eliminate dups - select distinct method

SELECT DISTINCT *
  INTO nodups
  FROM dups 

SELECT * FROM nodups

-- eliminate dups group by adding a pk

ALTER TABLE dbo.dups
  ADD pk INT NOT NULL IDENTITY PRIMARY KEY

-- identify dups
SELECT * 
  FROM dbo.dups
  WHERE PK NOT IN (
       SELECT min(pk) 
         FROM dups
         GROUP BY col1, col2, col3
         )

-- remove dups
DELETE dbo.dups
  WHERE PK NOT IN (
       SELECT min(pk) 
         FROM dups
         GROUP BY col1, col2, col3
         )

SELECT * FROM dbo.dups




-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Mark A
Matte
Sent: Tuesday, March 18, 2008 4:33 PM
To: Discussion concerning MS SQL Server
Subject: Re: [dba-SQLServer] Duplicate records


I get the data in a 2+gig text file...how do import without dups...I tried
that first...and it just stopped the import?


> Date: Tue, 18 Mar 2008 14:42:07 -0700
> From: fhtapia at gmail.com
> To: dba-sqlserver at databaseadvisors.com
> Subject: Re: [dba-SQLServer] Duplicate records
>
> Cursors are slow, you could choose to place an index to prevent duplicates
> so duplicate information just never makes it into the table?
>
>
>
> On Tue, Mar 18, 2008 at 1:46 PM, Mark A Matte 
> wrote:
>
>>
>> Hello All,
>>
>> I have a large table that occasionally duplicates make it into. The 2
>> fields that make the record unique are Case_ID and Note_Seq_Nbr. The
>> duplicates are identical...so I have no primary key. I used the code
below.
>> It doesn't run very fast...but it does the job of removing just the extra
>> records...
>>
>> Any thoughts,comments, or suggestions on making it faster/better...or a
>> totally different approach?
>>
>> Thanks,
>>
>> Mark A. Matte
>>
>>
>>
>> **********************BEGIN************************
>> --By Mark A. Matte 3/18/2008
>> declare @CaseID decimal(15,0), at NoteNum int, at NoteCount int
>> declare MyCursor cursor
>> for Select case_id,note_seq_nbr,count(*) as CT
>> from tblCaseNote_temp
>> group by case_id,note_seq_nbr
>> having count(*)>1
>> order by case_id,note_seq_nbr
>> open mycursor
>> fetch next from mycursor
>> into @CaseID, at NoteNum, at NoteCount
>> while (@@fetch_status =0)
>>
>> Begin
>> delete top(@noteCount-1) from tblCaseNote_Temp where
>> case_id=@CaseID and note_seq_nbr=@noteNum
>> fetch next from mycursor
>> into @CaseID, at NoteNum, at NoteCount
>> End
>> Close mycursor
>> deallocate mycursor
>> ***********************END*************************
>> _________________________________________________________________
>> Need to know the score, the latest news, or you need your Hotmail(R)-get
>> your "fix".
>> http://www.msnmobilefix.com/Default.aspx
>> _______________________________________________
>> dba-SQLServer mailing list
>> dba-SQLServer at databaseadvisors.com
>> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>> http://www.databaseadvisors.com
>>
>>
>
>
> --
> -Francisco
> http://sqlthis.blogspot.com | Tsql and More...
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
>

_________________________________________________________________
Shed those extra pounds with MSN and The Biggest Loser!
http://biggestloser.msn.com/
_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com


__________ NOD32 2958 (20080318) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com





More information about the dba-SQLServer mailing list