[dba-SQLServer] Append only new records

jwcolby jwcolby at colbyconsulting.com
Tue Mar 4 14:12:09 CST 2008


LOL.  Nothing is dirt simple when you are dealing with a hundred million
records.

I have many different tables, each table has name / address information.  I
already create a hash of the address fields (HashAddr), the address plus
last name (HashFamily) and Address / last name / first name (HashPerson).
Any given table will contain millions of records where there are multiple
people in a family at the same address.  I am looking to carve out just the
address (and it's hash) and dump it into an address table, however it needs
to be there only once.  That will form a "validated address table" where by
definition if the address is in there it is valid and deliverable.  The hash
of the address fields then becomes the PK for that table since it is unique.
Having done that I can then do joins on the address hash to see everyone who
has ever lived at any address.  I can also get relationships between the
lists.

Of course the address table will have more than a hundred million rows.  The
obvious is to do a "not in" query to select hashes not in the address table,
then append those records into the address table.  However with a hundred
million records and growing, that kind of operation could take a few
seconds.  8~)


John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Paul
Nielsen
Sent: Tuesday, March 04, 2008 1:36 PM
To: 'Discussion concerning MS SQL Server'
Subject: Re: [dba-SQLServer] Append only new records

You could always pump them all into a temp table, delete dups, and then move
them into the real table. It'd be dirt simple and repeatable. 

-Paul 

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Monday, March 03, 2008 3:17 PM
To: 'Discussion concerning MS SQL Server'
Subject: [dba-SQLServer] Append only new records

I am building a table of known validated addresses in the US.  For my
purposes all I have is Address, city, state, zip and HashAddr.  HashAddr is
a has which is guaranteed to be unique given the input which is the address
fields given.  

In Access you could append records in at random and those records which did
not collide with a unique index goes in, and those that do collide do not go
in.  Obviously an index on the HashAddress will be the unique index.  Is it
possible to do the same kind of thing in SQL Server?  When I have tried this
before (in SQL Server), the whole process aborted if a collision occurred,
with no records appended if there was any collision.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com


__________ NOD32 2918 (20080303) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com


_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com




More information about the dba-SQLServer mailing list