Arthur Fuller
artful at rogers.com
Fri Oct 8 11:31:36 CDT 2004
John W. Colby wrote: >So.... Now I need to know how to find and return the last word in a field. >I have never written a stored procedure but if you can give me a starting >point I think maybe I can pick it up from there. I assume that I can then >create an update query where a last name field is set to the value returned >from a SP where the SP is passed in the data from the name field. The SP >then does all the processing required to find that last name. > >As always, this needs to be as fast as possible since this will be running >against 64 million names. I will just be putting 3 or 4 machines to work >running this functionality against different sets of records to speed up the >whole process. > >John W. Colby >www.ColbyConsulting.com > >Contribute your unused CPU cycles to a good cause: >http://folding.stanford.edu/ > >-----Original Message----- >From: dba-sqlserver-bounces at databaseadvisors.com >[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of John W. >Colby >Sent: Friday, October 08, 2004 11:19 AM >To: dba-sqlserver at databaseadvisors.com >Subject: RE: [dba-SQLServer] Parse name field > > >Come to think of it though, for my purposes it might not be as difficult as >it appears. This is for the nVLDB mailing list and I do not need to sort on >or lookup names. I need the last name just for the purpose of building up a >check field where you use the first N characters of the last name, the zip >etc. Thus it doesn't matter to me whether I have the entire last name or >just the last word of the last name. > >It's always a good thing to talk about a problem because doing so triggers >the brain to analyze the problem. > >John W. Colby >www.ColbyConsulting.com > >Contribute your unused CPU cycles to a good cause: >http://folding.stanford.edu/ > >-----Original Message----- >From: dba-sqlserver-bounces at databaseadvisors.com >[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Elam, >Debbie >Sent: Friday, October 08, 2004 11:01 AM >To: 'dba-sqlserver at databaseadvisors.com' >Subject: RE: [dba-SQLServer] Parse name field > > >The biggest crimp I can see is people with spaces in their last name. I >have seen the last name de la Rosa. That is only one example. > >Debbie > >-----Original Message----- >From: Arthur Fuller [mailto:artful at rogers.com] >Sent: Friday, October 08, 2004 9:46 AM >To: dba-sqlserver at databaseadvisors.com >Subject: Re: [dba-SQLServer] Parse name field > > >John W. Colby wrote: > > > >>Does anyone have a SP (or whatever) that can parse a single field with >> >> >first > > >>/ middle / last name into separate fields? >> >>John W. Colby >>www.ColbyConsulting.com >> >> >>_______________________________________________ >>dba-SQLServer mailing list >>dba-SQLServer at databaseadvisors.com >>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >>http://www.databaseadvisors.com >> >> >> >> >> >> >This wouldn't take long to write, but first let's get the requirements >straight.... > >Assume the delimiter is a space? >Assume that if there is only one space, then there is no middle name? Assume >that if there are more than two spaces, the person has two middle >names? > >Yay or nay? > >A. >_______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com >- JENKENS & GILCHRIST E-MAIL NOTICE - This transmission may be: (1) subject >to the Attorney-Client Privilege, (2) an attorney work product, or (3) >strictly confidential. If you are not the intended recipient of this >message, you may not disclose, print, copy or disseminate this information. >If you have received this in error, please reply and notify the sender >(only) and delete the message. Unauthorized interception of this e-mail is a >violation of federal criminal law. >This communication does not reflect an intention by the sender or the >sender's client or principal to conduct a transaction or make any agreement >by electronic means. Nothing contained in this message or in any attachment >shall satisfy the requirements for a writing, and nothing contained herein >shall constitute a contract or electronic signature under the Electronic >Signatures in Global and National Commerce Act, any version of the Uniform >Electronic Transactions Act or any other statute governing electronic >transactions. _______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com > > > > >_______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com > > > > >_______________________________________________ >dba-SQLServer mailing list >dba-SQLServer at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/dba-sqlserver >http://www.databaseadvisors.com > > > > As was pointed out earlier, you need to anticipate such things as a Surname like "de la Vega", as well as two middle names. You might devise a rule that says the first space delimits the first name, but then what? 1. Antonio Carlos Jobim -- straightforward 2. Maria Theresa de la Vega -- first space delimits first name, then what? Lower case rule? 3. John Dos Passos -- violates rule 2, which implies that the second capitalized word would be the middle name All this aside, you now confront the sorting rules. Many (most?) schemes suggest that Dos Passos be listed under P not D. This rule at least gives a good starting point: assume that the last word is the most significant part of the surname. Then, assuming that you could somehow recognize that Dos is part of the surname, you could record the surname as Passos, Dos. And Maria's surname as Vega, de la. What we can assume with relative certainty is that the first space delimits the given name. After that it's guesswork. If this is truly important, then I might suggest a rather lengthy process. SELECT CharIndex( ' ', ClientName, 1) -- gives you the position of the first space SELECT SubString( ClientName, 1, CharIndex( ' ', ClientName, 1) ) -- gives you the first name SELECT SubString( ClientName, CharIndex( ' ', ClientName, 1), Len( ClientName ) ) -- gives you the remainder i.e. select substring('Arthur Fuller',7, Len('Arthur Fuller')) -- returns 'Fuller' Create a new table or add columns to the Clients table for GivenName, MiddleName, Surname and Remainder. Update these columns using the expressions above. SELECT DISTINCT Remainder FROM Table ORDER BY Remainder Inspect these and create a Surnames table, having Surname and Remainder columns. Upon encountering 'John Dos Passos', you know the first name is 'John', and the surname is 'Passos'. The trick is to figure out what to do with 'Dos'. (Upgrade to Windows? :) Time for the international community to step in. Zorro, for example, was actually Don Diego de la Vega. I assume from so many mafia movies that 'Don' is equivalent to 'Mr.' or perhaps 'Maestro' or perhaps 'Your Highness', but in any case it is not part of the given name. Now we come to 'Don Quixote' -- bummer! Good luck, baby! There! Now I feel almost as useful as the authors of MS-Help!