JWColby
jwcolby at colbyconsulting.com
Fri Dec 22 13:24:09 CST 2006
LOL, no I am not attempting to do this, I hire Accuzip to do this. I am in the process of running 60 million records through them to see how they fare. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Mark A Matte Sent: Friday, December 22, 2006 1:56 PM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Find the second occurrence of a characterinastring John, I've mentioned before a friend of mine does the same thing with his software and then matches it against equifax data. Accuracy is everything in his business. I talked with him about his approach...he said it boils down to defining your rules...and then a crap load of IF statements to find which rule/rules to apply. He has been doing this(and refining the code) for more than 10 years. His advice to someone building a similar tool to the one he developed would be: "Start with as many messed up records as you can...and keep adding IF statements until you can run it once against the data and all records come out correct. About 60 million should be a good start. Then get another 60 million messed up records...run it and see what you missed the first time. ...Or hire me to clean your data.." Just thought I'd share. You seem to have quite the battle ahead of you. Best of luck...and Happy Holidays. Mark A. Matte >From: artful at rogers.com >Reply-To: dba-sqlserver at databaseadvisors.com >To: dba-sqlserver at databaseadvisors.com >Subject: Re: [dba-SQLServer] Find the second occurrence of a character >inastring >Date: Fri, 22 Dec 2006 09:15:05 -0800 (PST) > >Quite right. This is not a simple SQL statement. > >----- Original Message ---- >From: JWColby <jwcolby at colbyconsulting.com> >To: dba-sqlserver at databaseadvisors.com >Sent: Friday, December 22, 2006 9:32:17 AM >Subject: Re: [dba-SQLServer] Find the second occurrence of a character >in astring > >I think the "best way" to handle this if you are going to truly try to >handle this problem is to: > >Develop a list of those "prefixes" to last names - Van, La, De etc. >Take the first word as the first name >Get a count of remaining words. >If count > 0 then > ProcessRest >Else > Rest is last name >Endif > >ProcessRest > Look up the second word in the prefix list. > If InList then > Treat everything left as the last name > else > Treat next word as middle name > remove middle name from string > Process rest as last name > endif >End ProcessRest > >Let's just say this is not s simple sql statement > >John W. Colby >Colby Consulting >www.ColbyConsulting.com > >-----Original Message----- >From: dba-sqlserver-bounces at databaseadvisors.com >[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of >artful at rogers.com >Sent: Thursday, December 21, 2006 3:00 PM >To: dba-sqlserver at databaseadvisors.com >Subject: Re: [dba-SQLServer] Find the second occurrence of a character >in astring > >I appreciate your point, but I'm still not certain of the best way to >go with my question, which concerns the way to handle some unusual surnames. > >van den Berq >la Flame >de la Vega >Ben Gurion > >and any number of names that begin with "al". Or "da" as in Leonardo. >My very limited Italian suggests that Leonardo was born in a town >called Vinci. > >So how does one sort such a list? On the capitalized word? On the first >letter of the two or three words considered the surname? > >Advice from Europeans, Asians, Africans, or even North Americans >familiar with this problem, would be appreciated. I have no immediate >problem that requires this solution. This is purely theoretical at the >moment, but who knows, someday I may need the answer. > >TIA, >Arthur > >----- Original Message ---- >From: Robert L. Stewart <rl_stewart at highstream.net> >To: dba-sqlserver at databaseadvisors.com >Sent: Thursday, December 21, 2006 1:41:04 PM >Subject: Re: [dba-SQLServer] Find the second occurrence of a character >in a string > >You put it in the right columns to begin with and don't try to parse it >out of a single one. :-) > > > > >_______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com > >_______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com > > > > > >_______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com > _________________________________________________________________ Fixing up the home? Live Search can help http://imagine-windowslive.com/search/kits/default.aspx?kit=improve&locale=e n-US&source=hmemailtaglinenov06&FORM=WLMTAG _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com