[AccessD] Fuzzy Matching (Like Soundex) or other ideas?

John Colby jwcolby at gmail.com
Sun Apr 23 15:45:29 CDT 2023


Per ChatGPT:

It is not working for me.  Just returns zero every time.  However I had to
replace the

WorksheetFunction.Min

With an implementation I have.

Here's an implementation of the Damerau-Levenshtein distance algorithm in
VBA:

scss

Function DamerauLevenshteinDistance(s As String, t As String) As Integer
    Dim m As Integer, n As Integer, i As Integer, j As Integer
    Dim cost As Integer, substitutionCost As Integer
    Dim d() As Integer, swap() As Integer

    m = Len(s)
    n = Len(t)

    ReDim d(0 To m, 0 To n)
    ReDim swap(0 To m)

    For i = 0 To m
        d(i, 0) = i
    Next i

    For j = 0 To n
        d(0, j) = j
    Next j

    For i = 1 To m
        For j = 1 To n
            cost = IIf(Mid(s, i, 1) = Mid(t, j, 1), 0, 1)
            substitutionCost = d(i - 1, j - 1) + cost

            d(i, j) = WorksheetFunction.Min(substitutionCost, d(i - 1,
j) + 1, d(i, j - 1) + 1)

            If i > 1 And j > 1 And Mid(s, i, 1) = Mid(t, j - 1, 1) And
Mid(s, i - 1, 1) = Mid(t, j, 1) Then
                d(i, j) = WorksheetFunction.Min(d(i, j), d(i - 2, j - 2) + cost)
            End If
        Next j
    Next i

    DamerauLevenshteinDistance = d(m, n)
End Function

To use this function, you can call it with two string arguments, like this:

mathematica

Dim distance As Integerdistance = DamerauLevenshteinDistance("kitten",
"sitting")Debug.Print distance ' Outputs: 3

This implementation follows the pseudocode of the Damerau-Levenshtein
algorithm, which computes the minimum number of single-character
insertions, deletions, substitutions, and transpositions required to
transform one string into another. The algorithm uses a dynamic programming
approach to build a table of distances between all substrings of the two
input strings, and then returns the distance between the full strings.

On Sat, Apr 22, 2023 at 1:41 AM Stuart McLachlan <stuart at lexacorp.com.pg>
wrote:

> On 22 Apr 2023 at 15:25, Stuart McLachlan wrote:
>
> > Oops, it's a long time since I did it.
> >
> > If the strings are different lengths, it's the levenshtein distance :)
> >
> >
> Or you could go a step further with Damerau-Levenshtein distance
> https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
>
> (The Damerau-Levenshtein distance differs from the classical Levenshtein
> distance
> by including transpositions among its allowable operations in addition to
> the three
> classical single-character edit operations (insertions, deletions and
> substitutions) )
>
>
> But for plain Lenenschtein, Here's some very old PB code, it should be
> trivial to adapt to VBA
>
> FUNCTION fn_LevenshteinDistance( BYVAL strText1 AS STRING, BYVAL strText2
> AS
> STRING, OPT lngCaseMatters AS LONG ) AS LONG
>   LOCAL lngText1Idx   AS LONG
>   LOCAL lngText1Len   AS LONG
>   LOCAL lngText2Idx   AS LONG
>   LOCAL lngText2Len   AS LONG
>   LOCAL pbytText1Char AS BYTE PTR
>   LOCAL pbytText2Char AS BYTE PTR
>   lngText1Len = LEN( strText1 )
>   lngText2Len = LEN( strText2 )
>   IF ( lngText1Len = 0 ) OR ( lngText2Len = 0 ) THEN FUNCTION = MAX%(
> lngText1Len,
> lngText2Len ) : EXIT FUNCTION
>   IF ISFALSE( ISMISSING( lngCaseMatters )) AND ISFALSE( lngCaseMatters )
> THEN
>     strText1 = UCASE$( strText1 )
>     strText2 = UCASE$( strText2 )
>   END IF
>   DIM lngMatrix( lngText1Len, lngText2Len ) AS LONG
>   FOR lngText1Idx = 0 TO lngText1Len
>     lngMatrix( lngText1Idx, 0 ) = lngText1Idx
>   NEXT lngText1Idx
>   FOR lngText2Idx = 0 TO lngText2Len
>     lngMatrix( 0, lngText2Idx ) = lngText2Idx
>   NEXT lngText2Idx
>   pbytText1Char = STRPTR( strText1 )
>   pbytText2Char = STRPTR( strText2 )
>   FOR lngText1Idx = 1 TO lngText1Len
>     FOR lngText2Idx = 1 TO lngText2Len
>       lngMatrix( lngText1Idx, lngText2Idx ) = _
>         MIN%( lngMatrix( lngText1Idx - 1, lngText2Idx     ) + 1, _
>           lngMatrix( lngText1Idx,     lngText2Idx - 1 ) + 1, _
>           lngMatrix( lngText1Idx - 1, lngText2Idx - 1 ) + _
>           ABS( @pbytText1Char[ lngText1Idx - 1 ] <> @pbytText2Char[
> lngText2Idx - 1 ] ))
>     NEXT lngText2Idx
>   NEXT lngText1Idx
>   FUNCTION = lngMatrix( lngText1Len, lngText2Len )
> END FUNCTION
>
>
>
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> https://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
>


-- 
John W. Colby
Colby Consulting


More information about the AccessD mailing list