[dba-SQLServer] Help! Violation of primary key constraint

Wed Sep 22 10:40:18 CDT 2004

In the nVLDB database I am working on, there are two sets of addresses, one
that the original person supplied, another that comes from the post office
when that person moves.  I need to build a single table where I pull the zip
(and later the rest of the fields) from the post office supplied field if
there is any data in there, then where no PO supplied data, pull the data
from the original address fields.

I have a PK field in my big table, int, identity.  In my new address table,
I created a matching PK field which I set to be the PK, not identity but
with a unique index.  I then add a zip field and a byte Src field to hold a
number that says which field the source data came from in the big table -
1=PO updated address data, 2=original data.

I built a view to pull the PK and the PO supplied zip from the big table and
a 1 for the SRC field, then built a query to append that to the new table I
am building.  That worked and I now have about 10 million records in this
new table with data supplied by the Post Office saying the person moved and
this is their new address.  

I built another query to pull the PK and the zip from the original address
field.  I THOUGHT I could just append that data into the new table and where
there was a collision in the PK (the data was already in the new table from
the Post Office) the record would drop on the floor, all others would go in
the new table.  

This doesn't work for some reason.  As soon as I try to append a record with
a PK collision the whole thing aborts with a "violation of PK constraint"
error message.

FIRST QUESTION: Why does that happen and is there a way to cause it to
continue processing so that the records where the PKs collide just drop out?

Given the failure of that method, I thought I'd build a "not in" query like
I would do in Access, where I would pull all the records where the PK in the
Big table is not found in the new table.  SQL Server didn't like my syntax
on that one.

So... I built a query where I joined the big table on the new table on PK
with the <> operator in the join, figuring I would get a set of records from
the big table where none of the PKs were in the new table.  An hour later
that query is still running....

It seems like such a simple thing.  I'm sure you hear this a lot, but in
Access I would have had it done hours ago.  Well... I'd have accomplished
the task hours ago.  With 165 million records Access itself would NEVER have
finished the task (which is why I'm here).

So, how do I accomplish what I am after here.  

John W. Colby
www.ColbyConsulting.com