Jim Lawrence
accessd at shaw.ca
Mon Aug 14 12:46:18 CDT 2006
Hi John: With the latest fuzzy match Soundex algorithm (enhanced version...US census uses it.) it will translate a field of data into a simplified number. See http://en.wikipedia.org/wiki/Soundex and http://www.creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm for details and access to code samples. (I have one around here somewhere and will send a copy when found...) Example: Twin Pines Drive, Twin Pines Dr and Twin Pines Dr. would resolve to index T515 which I believe is what you are looking for. The Soundex field would have to be created and populated after the 60 million odd rows have been imported... post process. And the address number will have to be managed; maybe as a separate field. Jim -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of JWColby Sent: Monday, August 14, 2006 9:58 AM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Update query leaves null Tell me more. Just remember that this is a table with (ATM) 60 million records and will grow, so whatever it does can't be going out to lunch for the next week calculating something. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Jim Lawrence Sent: Monday, August 14, 2006 12:21 PM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Update query leaves null Hi John: Have you thought of using a soundex combination type index field solution? Padded from the right with zeros can group the data surprisingly accurate. Jim -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of JWColby Sent: Monday, August 14, 2006 6:37 AM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Update query leaves null Unfortunately this is one of those conventions that the client understands and wants. It all has to do with the fact that data is pulled in from dozens of surveys. The person taking the survey may (for example) write his address as 1723 Twin Pines Drive on one survey, 1723 Twin Pines Dr on the next, 1723 Twin Pines Dr. on the next. All three are different and a unique index does not help. OTOH, if you take the first 5 characters of all of those you end up with 1723^ where ^ is a space character. ALL of the various means of spelling the address would end up with the same first N characters (unless misspelled). All addresses will be run through a CASS process that ensures that the address is deliverable. Approximately 11% of all my records drop out there for address error issues. The client WANTS a single field that they can compare to the same field in any other table of addresses - the first 5 characters of each of the address fields, then that with the first 5 characters of the last name (to get down to a "family") and then the first 5 characters of the first name added to that to get down to the individual. There are just times when a client (and their industry) does something in a specific way and it is useless to try to get them to change. It isn't the CLIENT that needs to change, it is the INDUSTRY and I don't have the time to fight THAT battle! John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Haslett, Andrew Sent: Monday, August 14, 2006 5:32 AM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Update query leaves null Why not just build a unique index / constraint on the existing 4 fields? This will prevent duplicates from being created at all... -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of JWColby Sent: Saturday, 12 August 2006 6:08 AM To: dba-sqlserver at databaseadvisors.com Subject: [dba-SQLServer] Update query leaves null I need to build a new field on a table, and insert the first N characters of Y other fields into that new field. This will be known as a match code and it is an accepted and known method in the client's industry of weeding out duplicates. For example I need to create a new string with the first 5 characters of the first name, last name, address1, address2, zip and the plus4. I will write this new string back into a new field. What I want to do is to build a unique index on the field. Then as the update occurs, I need a silent failure if the match code already exists. Thus if the same exact match code is already in the table, just leave a null of empty string in that match field for the duplicate record. Doing this would allow me to then go back and delete the records where the match code is null and "de-dupe" the table. If I do an update query in SQL, will it do this silent failure or will it stop and announce the failure? Do I need to do something specific to get a "silent failure"? John W. Colby Colby Consulting www.ColbyConsulting.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com IMPORTANT - PLEASE READ *** This email and any files transmitted with it are confidential and may contain information protected by law from disclosure. If you have received this message in error, please notify the sender immediately and delete this email from your system. No warranty is given that this email or files, if attached to this email, are free from computer viruses or other defects. They are provided on the basis the user assumes all responsibility for loss, damage or consequence resulting directly or indirectly from their use, whether caused by the negligence of the sender or not. _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com