[AccessD] "fuzzy logic" search

Charlotte Foust cfoust at infostatsystems.com
Tue Oct 21 20:01:19 CDT 2003


It's old code that I never put into a class and haven't updated for
recent improvements in VBA.  I played around with several variations and
haven't looked at it in years.  Here ya go!

Charlotte Foust

'*********************************************
Option Compare Database   'Use database order for string comparisons
Option Explicit

'Soundex routines created by Charlotte Foust 1/96
' last modified 3/15/99
'
'Starting with the first letter, examine each letter in turn, and
'if we can use it, append it to our string, to a maximum of 4 codes.
'We skip a consonant if it has the same category number as
'the previous consonant, unless there is a separator (vowel, etc.)
'between the two.
'
'Instead of the usual mixed string, this function returns a string
'comprised entirely of numbers.  In addition, leading vowels are
'all given the same value of 0 to allow for spelling variations
'like "English" and "Inglish".  Non-alphabetic characters are
'ignored.  Null string (or string with no non-alphabetic chars)
'returns 0.
'
'The resulting soundex code may be either 4 or 5 characters long
'depending on the alphabetical index value of the initial letter,
'which normally would be anything from 1 to 26. However, all
'leading vowels receive the code value of 0.

Dim varChar() As Variant
Dim strSentence() As String

Function cf_ApplySoundex(ByVal psSearchFor As String) As String
    Dim strSearch As String, intElement As Integer
    Dim strSdxCode As String, intCode As Integer
    Dim strRemaining As String
    Dim strSdxPrefix As String, strSdxSuffix As String

    On Error GoTo cf_ApplySoundex_err

    '\ Initialize variables
    strSearch = LTrim(psSearchFor)
    strSdxCode = ""
    strSdxPrefix = ""
    strSdxSuffix = ""
    strRemaining = Mid(strSearch, 2, Len(strSearch) - 1)
    
    ReDim varChar(0 To Len(strSearch)) As Variant
    '\ Store each character of the string in
    '\ an element of the array.
    For intElement = 1 To Len(strSearch)
        varChar(intElement) = Mid(strSearch, intElement, 1)
    Next intElement
    '\ If the first character is a vowel, use zero
    '\ otherwise, use the alphabetical index of the character
     Select Case LCase(varChar(1))
        Case "a", "e", "i", "o", "u", "y"
            intCode = 0
        Case Else
            intCode = Asc(LCase(varChar(1))) - Asc("a") + 1
     End Select
     strSdxPrefix = CStr(intCode)
    Do While Len(strSdxSuffix) < 3
        '\ Get the rest of the code
        For intElement = 2 To Len(strSearch)
            '\ Get the code for each subsequent character
            intCode = cf_GetCharValue(varChar(intElement))
            '\ Handling for repeated consonants without a separator
            If varChar((intElement) - 1) = varChar(intElement) Then
                intCode = -1
            End If
            If intCode <> -1 And intCode <> 0 Then
                '\ If not a  repeated consonant and not a vowel
                '\ add it to the Soundex code
                strSdxSuffix = strSdxSuffix & intCode
            End If
        Next intElement
        '\ Test to see if you have at least 3 digits.
        '\ If not, pad the string with right zeros.
        Do While Len(strSdxSuffix) < 3
            strSdxSuffix = strSdxSuffix & "0"
        Loop
    Loop

    strSdxCode = strSdxPrefix & strSdxSuffix
    cf_ApplySoundex = strSdxCode
cf_ApplySoundex_exit:
    Exit Function
cf_ApplySoundex_err:
    MsgBox "ApplySoundex error " & Err.Number & ":  " & Err.Description
    Resume cf_ApplySoundex_err
End Function

Function cf_GetCharValue(ByVal pvChar As Variant) As Integer
' Applies the Russell algorithm to determine the "value" of
' the character passed.
    Dim strChar As String, intChar As Integer
    strChar = CStr(pvChar)
    Select Case LCase$(strChar)
        Case "a", "e", "h", "i", "o", "u", "w", "y"
            intChar = 0
        Case "b", "f", "p", "v"
            intChar = 1
        Case "c", "g", "j", "k", "q", "s", "x", "z"
            intChar = 2
        Case "d", "t"
            intChar = 3
        Case "l"
            intChar = 4
        Case "m", "n"
            intChar = 5
        Case "r"
            intChar = 6
        Case Else
            intChar = 0
    End Select
    cf_GetCharValue = intChar
End Function


Function cf_SoundexMatch()
'Asks the User for a Word, Converts it to a soundex Value and then
'performs a search on the current control.  Subsequent entries of the
'same Word will return subsequent records, if any.
    Dim strSearchFor As String
    strSearchFor = InputBox("Enter the matching string to Find")
    '\ Find the first matching record in the current table
    '\ Not case sensitive, search down for data as stored in database
    '\ match entire field, start at field following current record.
    DoCmd.FindRecord cf_ApplySoundex(strSearchFor), acAnywhere, False,
acDown, , acCurrent, False
End Function

'*******************************************

-----Original Message-----
From: John Colby [mailto:jcolby at colbyconsulting.com] 
Sent: Tuesday, October 21, 2003 4:32 PM
To: Access Developers discussion and problem solving
Subject: RE: [AccessD] "fuzzy logic" search


Hmmm soundex would be good though.  Do you have this function?  Is it a
class, a single function?  How does it work?

John W. Colby
www.colbyconsulting.com

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com]On Behalf Of Charlotte
Foust
Sent: Tuesday, October 21, 2003 8:20 PM
To: Access Developers discussion and problem solving
Subject: RE: [AccessD] "fuzzy logic" search


John,

I've received this kind of request before and I looked into it once. The
fuzzy logic math was way, way beyond my grasp, so I settled for offering
alternatives like soundex on the names.  I created a customized soundex
that used a number for leading vowels rather than the standard "vowel
followed by number" pattern, so that if they misspelled a name with a
leading vowel, there were still possible matches.  But SSN?  I mean,
think of it.  If a SSN has 9 digits, any one of which can repeat, how do
you determine a mismatch by up to 2 characters?  WHICH 2 characters?

Charlotte Foust


More information about the AccessD mailing list