Jim Lawrence
jlawrenc1 at shaw.ca
Fri Dec 22 16:12:12 CST 2006
Just a note... it seems to always take 50% of a contract's time to clean up the data, it is the most difficult assignment to get the client to fund and yet it the most crucial task to perform. Just a comment Jim -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Mark A Matte Sent: Friday, December 22, 2006 10:56 AM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Find the second occurrence of a character inastring John, I've mentioned before a friend of mine does the same thing with his software and then matches it against equifax data. Accuracy is everything in his business. I talked with him about his approach...he said it boils down to defining your rules...and then a crap load of IF statements to find which rule/rules to apply. He has been doing this(and refining the code) for more than 10 years. His advice to someone building a similar tool to the one he developed would be: "Start with as many messed up records as you can...and keep adding IF statements until you can run it once against the data and all records come out correct. About 60 million should be a good start. Then get another 60 million messed up records...run it and see what you missed the first time. ...Or hire me to clean your data.." Just thought I'd share. You seem to have quite the battle ahead of you. Best of luck...and Happy Holidays. Mark A. Matte >From: artful at rogers.com >Reply-To: dba-sqlserver at databaseadvisors.com >To: dba-sqlserver at databaseadvisors.com >Subject: Re: [dba-SQLServer] Find the second occurrence of a character >inastring >Date: Fri, 22 Dec 2006 09:15:05 -0800 (PST) > >Quite right. This is not a simple SQL statement. > >----- Original Message ---- >From: JWColby <jwcolby at colbyconsulting.com> >To: dba-sqlserver at databaseadvisors.com >Sent: Friday, December 22, 2006 9:32:17 AM >Subject: Re: [dba-SQLServer] Find the second occurrence of a character in >astring > >I think the "best way" to handle this if you are going to truly try to >handle this problem is to: > >Develop a list of those "prefixes" to last names - Van, La, De etc. >Take the first word as the first name >Get a count of remaining words. >If count > 0 then > ProcessRest >Else > Rest is last name >Endif > >ProcessRest > Look up the second word in the prefix list. > If InList then > Treat everything left as the last name > else > Treat next word as middle name > remove middle name from string > Process rest as last name > endif >End ProcessRest > >Let's just say this is not s simple sql statement > >John W. Colby >Colby Consulting >www.ColbyConsulting.com > >-----Original Message----- >From: dba-sqlserver-bounces at databaseadvisors.com >[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of >artful at rogers.com >Sent: Thursday, December 21, 2006 3:00 PM >To: dba-sqlserver at databaseadvisors.com >Subject: Re: [dba-SQLServer] Find the second occurrence of a character in >astring > >I appreciate your point, but I'm still not certain of the best way to go >with my question, which concerns the way to handle some unusual surnames. > >van den Berq >la Flame >de la Vega >Ben Gurion > >and any number of names that begin with "al". Or "da" as in Leonardo. My >very limited Italian suggests that Leonardo was born in a town called >Vinci. > >So how does one sort such a list? On the capitalized word? On the first >letter of the two or three words considered the surname? > >Advice from Europeans, Asians, Africans, or even North Americans familiar >with this problem, would be appreciated. I have no immediate problem that >requires this solution. This is purely theoretical at the moment, but who >knows, someday I may need the answer. > >TIA, >Arthur > >----- Original Message ---- >From: Robert L. Stewart <rl_stewart at highstream.net> >To: dba-sqlserver at databaseadvisors.com >Sent: Thursday, December 21, 2006 1:41:04 PM >Subject: Re: [dba-SQLServer] Find the second occurrence of a character in a >string > >You put it in the right columns to begin with and don't try to parse it out >of a single one. :-) > > > > >_______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com > >_______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com > > > > > >_______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com > _________________________________________________________________ Fixing up the home? Live Search can help http://imagine-windowslive.com/search/kits/default.aspx?kit=improve&locale=e n-US&source=hmemailtaglinenov06&FORM=WLMTAG _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com