[dba-SQLServer] Find the second occurrence of a characterinastring

JWColby jwcolby at colbyconsulting.com
Fri Dec 22 13:24:09 CST 2006


LOL, no I am not attempting to do this, I hire Accuzip to do this.  I am in
the process of running 60 million records through them to see how they fare.

John W. Colby
Colby Consulting
www.ColbyConsulting.com

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Mark A
Matte
Sent: Friday, December 22, 2006 1:56 PM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Find the second occurrence of a
characterinastring

John,

I've mentioned before a friend of mine does the same thing with his software
and then matches it against equifax data.  Accuracy is everything in his
business.  I talked with him about his approach...he said it boils down to
defining your rules...and then a crap load of IF statements to find which
rule/rules to apply.  He has been doing this(and refining the code) for more
than 10 years.

His advice to someone building a similar tool to the one he developed would
be:

"Start with as many messed up records as you can...and keep adding IF
statements until you can run it once against the data and all records come
out correct.  About 60 million should be a good start.  Then get another 60
million messed up records...run it and see what you missed the first time.  
...Or hire me to clean your data.."

Just thought I'd share.  You seem to have quite the battle ahead of you.  
Best of luck...and Happy Holidays.

Mark A. Matte


>From: artful at rogers.com
>Reply-To: dba-sqlserver at databaseadvisors.com
>To: dba-sqlserver at databaseadvisors.com
>Subject: Re: [dba-SQLServer] Find the second occurrence of a character 
>inastring
>Date: Fri, 22 Dec 2006 09:15:05 -0800 (PST)
>
>Quite right. This is not a simple SQL statement.
>
>----- Original Message ----
>From: JWColby <jwcolby at colbyconsulting.com>
>To: dba-sqlserver at databaseadvisors.com
>Sent: Friday, December 22, 2006 9:32:17 AM
>Subject: Re: [dba-SQLServer] Find the second occurrence of a character 
>in astring
>
>I think the "best way" to handle this if you are going to truly try to 
>handle this problem is to:
>
>Develop a list of those "prefixes" to last names - Van, La, De etc.
>Take the first word as the first name
>Get a count of remaining words.
>If count > 0 then
>     ProcessRest
>Else
>     Rest is last name
>Endif
>
>ProcessRest
>     Look up the second word in the prefix list.
>     If InList then
>         Treat everything left as the last name
>     else
>         Treat next word as middle name
>         remove middle name from string
>         Process rest as last name
>     endif
>End ProcessRest
>
>Let's just say this is not s simple sql statement
>
>John W. Colby
>Colby Consulting
>www.ColbyConsulting.com
>
>-----Original Message-----
>From: dba-sqlserver-bounces at databaseadvisors.com
>[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of 
>artful at rogers.com
>Sent: Thursday, December 21, 2006 3:00 PM
>To: dba-sqlserver at databaseadvisors.com
>Subject: Re: [dba-SQLServer] Find the second occurrence of a character 
>in astring
>
>I appreciate your point, but I'm still not certain of the best way to 
>go with my question, which concerns the way to handle some unusual
surnames.
>
>van den Berq
>la Flame
>de la Vega
>Ben Gurion
>
>and any number of names that begin with "al". Or "da" as in Leonardo. 
>My very limited Italian suggests that Leonardo was born in a town 
>called Vinci.
>
>So how does one sort such a list? On the capitalized word? On the first 
>letter of the two or three words considered the surname?
>
>Advice from Europeans, Asians, Africans, or even North Americans 
>familiar with this problem, would be appreciated. I have no immediate 
>problem that requires this solution. This is purely theoretical at the 
>moment, but who knows, someday I may need the answer.
>
>TIA,
>Arthur
>
>----- Original Message ----
>From: Robert L. Stewart <rl_stewart at highstream.net>
>To: dba-sqlserver at databaseadvisors.com
>Sent: Thursday, December 21, 2006 1:41:04 PM
>Subject: Re: [dba-SQLServer] Find the second occurrence of a character 
>in a string
>
>You put it in the right columns to begin with and don't try to parse it 
>out of a single one.  :-)
>
>
>
>
>_______________________________________________
>dba-SQLServer mailing list
>dba-SQLServer at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>http://www.databaseadvisors.com
>
>_______________________________________________
>dba-SQLServer mailing list
>dba-SQLServer at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>http://www.databaseadvisors.com
>
>
>
>
>
>_______________________________________________
>dba-SQLServer mailing list
>dba-SQLServer at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>http://www.databaseadvisors.com
>

_________________________________________________________________
Fixing up the home? Live Search can help
http://imagine-windowslive.com/search/kits/default.aspx?kit=improve&locale=e
n-US&source=hmemailtaglinenov06&FORM=WLMTAG

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com




More information about the dba-SQLServer mailing list