[dba-SQLServer] Parse name field

Arthur Fuller artful at rogers.com
Fri Oct 8 11:31:36 CDT 2004


John W. Colby wrote:

>So.... Now I need to know how to find and return the last word in a field.
>I have never written a stored procedure but if you can give me a starting
>point I think maybe I can pick it up from there.  I assume that I can then
>create an update query where a last name field is set to the value returned
>from a SP where the SP is passed in the data from the name field.  The SP
>then does all the processing required to find that last name.
>
>As always, this needs to be as fast as possible since this will be running
>against 64 million names.  I will just be putting 3 or 4 machines to work
>running this functionality against different sets of records to speed up the
>whole process.
>
>John W. Colby
>www.ColbyConsulting.com 
>
>Contribute your unused CPU cycles to a good cause:
>http://folding.stanford.edu/
>
>-----Original Message-----
>From: dba-sqlserver-bounces at databaseadvisors.com
>[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of John W.
>Colby
>Sent: Friday, October 08, 2004 11:19 AM
>To: dba-sqlserver at databaseadvisors.com
>Subject: RE: [dba-SQLServer] Parse name field
>
>
>Come to think of it though, for my purposes it might not be as difficult as
>it appears.  This is for the nVLDB mailing list and I do not need to sort on
>or lookup names.  I need the last name just for the purpose of building up a
>check field where you use the first N characters of the last name, the zip
>etc.  Thus it doesn't matter to me whether I have the entire last name or
>just the last word of the last name.
>
>It's always a good thing to talk about a problem because doing so triggers
>the brain to analyze the problem.
>
>John W. Colby
>www.ColbyConsulting.com 
>
>Contribute your unused CPU cycles to a good cause:
>http://folding.stanford.edu/
>
>-----Original Message-----
>From: dba-sqlserver-bounces at databaseadvisors.com
>[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Elam,
>Debbie
>Sent: Friday, October 08, 2004 11:01 AM
>To: 'dba-sqlserver at databaseadvisors.com'
>Subject: RE: [dba-SQLServer] Parse name field
>
>
>The biggest crimp I can see is people with spaces in their last name.  I
>have seen the last name de la Rosa.  That is only one example.
>
>Debbie
>
>-----Original Message-----
>From: Arthur Fuller [mailto:artful at rogers.com] 
>Sent: Friday, October 08, 2004 9:46 AM
>To: dba-sqlserver at databaseadvisors.com
>Subject: Re: [dba-SQLServer] Parse name field
>
>
>John W. Colby wrote:
>
>  
>
>>Does anyone have a SP (or whatever) that can parse a single field with
>>    
>>
>first
>  
>
>>/ middle / last name into separate fields?
>>
>>John W. Colby
>>www.ColbyConsulting.com
>>
>>
>>_______________________________________________
>>dba-SQLServer mailing list
>>dba-SQLServer at databaseadvisors.com
>>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>>http://www.databaseadvisors.com
>>
>>
>> 
>>
>>    
>>
>This wouldn't take long to write, but first let's get the requirements 
>straight....
>
>Assume the delimiter is a space?
>Assume that if there is only one space, then there is no middle name? Assume
>that if there are more than two spaces, the person has two middle 
>names?
>
>Yay or nay?
>
>A.
>_______________________________________________
>dba-SQLServer mailing list
>dba-SQLServer at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>http://www.databaseadvisors.com
>- JENKENS & GILCHRIST E-MAIL NOTICE - This transmission may be: (1) subject
>to the Attorney-Client Privilege, (2) an attorney work product, or (3)
>strictly confidential. If you are not the intended recipient of this
>message, you may not disclose, print, copy or disseminate this information.
>If you have received this in error, please reply and notify the sender
>(only) and delete the message. Unauthorized interception of this e-mail is a
>violation of federal criminal law. 
>This communication does not reflect an intention by the sender or the
>sender's client or principal to conduct a transaction or make any agreement
>by electronic means.  Nothing contained in this message or in any attachment
>shall satisfy the requirements for a writing, and nothing contained herein
>shall constitute a contract or electronic signature under the Electronic
>Signatures in Global and National Commerce Act, any version of the Uniform
>Electronic Transactions Act or any other statute governing electronic
>transactions. _______________________________________________
>dba-SQLServer mailing list
>dba-SQLServer at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>http://www.databaseadvisors.com
>
>
>
>
>_______________________________________________
>dba-SQLServer mailing list
>dba-SQLServer at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>http://www.databaseadvisors.com
>
>
>
>
>_______________________________________________
>dba-SQLServer mailing list
>dba-SQLServer at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>http://www.databaseadvisors.com
>
>
>  
>
As was pointed out earlier, you need to anticipate such things as a 
Surname like "de la Vega", as well as two middle names. You might devise 
a rule that says the first space delimits the first name, but then what?

1. Antonio Carlos Jobim -- straightforward
2. Maria Theresa de la Vega -- first space delimits first name, then 
what? Lower case rule?
3. John Dos Passos -- violates rule 2, which implies that the second 
capitalized word would be the middle name

All this aside, you now confront the sorting rules. Many (most?) schemes 
suggest that Dos Passos be listed under P not D. This rule at least 
gives a good starting point: assume that the last word is the most 
significant part of the surname. Then, assuming that you could somehow 
recognize that Dos is part of the surname, you could record the surname 
as Passos, Dos. And Maria's surname as Vega, de la.

What we can assume with relative certainty is that the first space 
delimits the given name. After that it's guesswork. If this is truly 
important, then I might suggest a rather lengthy process.

SELECT CharIndex( ' ', ClientName, 1) -- gives you the position of the 
first space

SELECT SubString( ClientName, 1, CharIndex( ' ', ClientName, 1) ) -- 
gives you the first name

SELECT SubString( ClientName, CharIndex( ' ', ClientName, 1), Len( 
ClientName ) ) -- gives you the remainder
i.e. select substring('Arthur Fuller',7, Len('Arthur Fuller')) -- 
returns 'Fuller'

Create a new table or add columns to the Clients table for GivenName, 
MiddleName, Surname and Remainder. Update these columns using the 
expressions above.

SELECT DISTINCT  Remainder FROM Table ORDER BY Remainder

Inspect these and create a Surnames table, having Surname and Remainder 
columns. Upon encountering 'John Dos Passos', you know the first name is 
'John', and the surname is 'Passos'. The trick is to figure out what to 
do with 'Dos'. (Upgrade to Windows? :)

Time for the international community to step in. Zorro, for example, was 
actually Don Diego de la Vega. I assume from so many mafia movies that 
'Don' is equivalent to 'Mr.' or perhaps 'Maestro' or perhaps 'Your 
Highness', but in any case it is not part of the given name. Now we come 
to 'Don Quixote' -- bummer!

Good luck, baby!

There! Now I feel almost as useful as the authors of MS-Help!






More information about the dba-SQLServer mailing list