Paul Nielsen
pauln at sqlserverbible.com
Tue Mar 4 15:05:34 CST 2008
Yep, SQL Server typically has larger databases. The average sized db for SQL Server is probably 500Gb +. At the last PASS conference when asked how many of you have databases over 1 Tb most hands went up. I normally test any app with millions of rows. This is my last contract job, http://sqlblog.com/blogs/paul_nielsen/archive/2007/12/12/10-lessons-from-35k -tps.aspx in your case I see two points of pain, First, the " many different tables". How many tables have addresses? Is it dynamic or a fixed set of tables. Either way it's scriptable. Secondly, hashes for large sets tend to not be unique. I'd use an bigint identity. The process is not difficult. 1) Insert all the addresses from every table into a single table with an identity. 2) Delete dups. 3) If you dislike the gaps, alter the table to remove the identity and then add a new identity. 4) Add AddressID column to each of the many tables 5) update manytables set mt.addressid = a.addressid from manytables join address on a.address = manytables.address Is really is dirt simple of a onetime conversion process. -Paul -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Tuesday, March 04, 2008 1:12 PM To: 'Discussion concerning MS SQL Server' Subject: Re: [dba-SQLServer] Append only new records LOL. Nothing is dirt simple when you are dealing with a hundred million records. I have many different tables, each table has name / address information. I already create a hash of the address fields (HashAddr), the address plus last name (HashFamily) and Address / last name / first name (HashPerson). Any given table will contain millions of records where there are multiple people in a family at the same address. I am looking to carve out just the address (and it's hash) and dump it into an address table, however it needs to be there only once. That will form a "validated address table" where by definition if the address is in there it is valid and deliverable. The hash of the address fields then becomes the PK for that table since it is unique. Having done that I can then do joins on the address hash to see everyone who has ever lived at any address. I can also get relationships between the lists. Of course the address table will have more than a hundred million rows. The obvious is to do a "not in" query to select hashes not in the address table, then append those records into the address table. However with a hundred million records and growing, that kind of operation could take a few seconds. 8~) John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Paul Nielsen Sent: Tuesday, March 04, 2008 1:36 PM To: 'Discussion concerning MS SQL Server' Subject: Re: [dba-SQLServer] Append only new records You could always pump them all into a temp table, delete dups, and then move them into the real table. It'd be dirt simple and repeatable. -Paul -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Monday, March 03, 2008 3:17 PM To: 'Discussion concerning MS SQL Server' Subject: [dba-SQLServer] Append only new records I am building a table of known validated addresses in the US. For my purposes all I have is Address, city, state, zip and HashAddr. HashAddr is a has which is guaranteed to be unique given the input which is the address fields given. In Access you could append records in at random and those records which did not collide with a unique index goes in, and those that do collide do not go in. Obviously an index on the HashAddress will be the unique index. Is it possible to do the same kind of thing in SQL Server? When I have tried this before (in SQL Server), the whole process aborted if a collision occurred, with no records appended if there was any collision. John W. Colby Colby Consulting www.ColbyConsulting.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com __________ NOD32 2918 (20080303) Information __________ This message was checked by NOD32 antivirus system. http://www.eset.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com __________ NOD32 2921 (20080304) Information __________ This message was checked by NOD32 antivirus system. http://www.eset.com