[AccessD] Fuzzy Matching (Like Soundex) or other ideas?

Stuart McLachlan stuart at lexacorp.com.pg
Sun Apr 23 16:22:24 CDT 2023


Yep ChatGPT generally returns buggy code  that requires cleaning up.  :)

Problem. VBA evaluates every condition in an  If ... Then 
even if an early condition fails unlike some other languages which "short circuit" evaluation as 
soon as a condition fails (PB anyone?).

So it still try's to evaluates mid(s,i-1,1) even though i = 1. Which obviously throws an error 
(i.e. Mid$(s,0,1) )

The followng  code works for me and returns the value of 3

Option Compare Database
Option Explicit

Function DamerauLevenshteinDistance(s As String, t As String) As Long
    Dim m As Long, n As Long, i As Long, j As Long
    Dim cost As Long, substitutionCost As Long
    Dim d() As Long, swap() As Long

    m = Len(s)
    n = Len(t)

    ReDim d(0 To m, 0 To n)
    ReDim swap(0 To m)

    For i = 0 To m
        d(i, 0) = i
    Next i

    For j = 0 To n
        d(0, j) = j
    Next j

    For i = 1 To m
        For j = 1 To n
            cost = IIf(Mid(s, i, 1) = Mid(t, j, 1), 0, 1)
            substitutionCost = d(i - 1, j - 1) + cost
            d(i, j) = minlng(substitutionCost, minlng(d(i - 1, j) + 1, d(i, j - 1) + 1))
            If i > 1 And j > 1 Then
                If Mid(s, i, 1) = Mid(t, j - 1, 1) And Mid(s, i - 1, 1) = Mid(t, j, 1) Then
                    d(i, j) = minlng(d(i, j), d(i - 2, j - 2) + cost)
                End If
            End If
        Next j
    Next i
    DamerauLevenshteinDistance = d(m, n)
End Function

Function minlng(l1 As Long, l2 As Long) As Long
If l1 <= l2 Then
    minlng = l1
Else
   minlng = l2
End If
End Function

Function TestDL() As Long
Dim distance As Long
distance = DamerauLevenshteinDistance("kitten", "sitting")
Debug.Print distance ' Outputs: 3
End Function





On 23 Apr 2023 at 16:45, John Colby wrote:

> Per ChatGPT:
> 
> It is not working for me.  Just returns zero every time.  However I
> had to replace the
> 
> WorksheetFunction.Min
> 
> With an implementation I have.
> 
> Here's an implementation of the Damerau-Levenshtein distance algorithm
> in VBA:
> 
> scss
> 
> Function DamerauLevenshteinDistance(s As String, t As String) As
> Integer
>     Dim m As Integer, n As Integer, i As Integer, j As Integer
>     Dim cost As Integer, substitutionCost As Integer
>     Dim d() As Integer, swap() As Integer
> 
>     m = Len(s)
>     n = Len(t)
> 
>     ReDim d(0 To m, 0 To n)
>     ReDim swap(0 To m)
> 
>     For i = 0 To m
>         d(i, 0) = i
>     Next i
> 
>     For j = 0 To n
>         d(0, j) = j
>     Next j
> 
>     For i = 1 To m
>         For j = 1 To n
>             cost = IIf(Mid(s, i, 1) = Mid(t, j, 1), 0, 1)
>             substitutionCost = d(i - 1, j - 1) + cost
> 
>             d(i, j) = WorksheetFunction.Min(substitutionCost, d(i - 1,
> j) + 1, d(i, j - 1) + 1)
> 
>             If i > 1 And j > 1 And Mid(s, i, 1) = Mid(t, j - 1, 1) And
> Mid(s, i - 1, 1) = Mid(t, j, 1) Then
>                 d(i, j) = WorksheetFunction.Min(d(i, j), d(i - 2, j -
>                 2) + cost)
>             End If
>         Next j
>     Next i
> 
>     DamerauLevenshteinDistance = d(m, n)
> End Function
> 
> To use this function, you can call it with two string arguments, like
> this:
> 
> mathematica
> 
> Dim distance As Integerdistance = DamerauLevenshteinDistance("kitten",
> "sitting")Debug.Print distance ' Outputs: 3
> 
> This implementation follows the pseudocode of the Damerau-Levenshtein
> algorithm, which computes the minimum number of single-character
> insertions, deletions, substitutions, and transpositions required to
> transform one string into another. The algorithm uses a dynamic
> programming approach to build a table of distances between all
> substrings of the two input strings, and then returns the distance
> between the full strings.
> 
> On Sat, Apr 22, 2023 at 1:41AM Stuart McLachlan
> <stuart at lexacorp.com.pg> wrote:
> 
> > On 22 Apr 2023 at 15:25, Stuart McLachlan wrote:
> >
> > > Oops, it's a long time since I did it.
> > >
> > > If the strings are different lengths, it's the levenshtein
> > > distance :)
> > >
> > >
> > Or you could go a step further with Damerau-Levenshtein distance
> > https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
> >
> > (The Damerau-Levenshtein distance differs from the classical
> > Levenshtein distance by including transpositions among its allowable
> > operations in addition to the three classical single-character edit
> > operations (insertions, deletions and substitutions) )
> >
> >
> > But for plain Lenenschtein, Here's some very old PB code, it should
> > be trivial to adapt to VBA
> >
> > FUNCTION fn_LevenshteinDistance( BYVAL strText1 AS STRING, BYVAL
> > strText2 AS STRING, OPT lngCaseMatters AS LONG ) AS LONG
> >   LOCAL lngText1Idx   AS LONG
> >   LOCAL lngText1Len   AS LONG
> >   LOCAL lngText2Idx   AS LONG
> >   LOCAL lngText2Len   AS LONG
> >   LOCAL pbytText1Char AS BYTE PTR
> >   LOCAL pbytText2Char AS BYTE PTR
> >   lngText1Len = LEN( strText1 )
> >   lngText2Len = LEN( strText2 )
> >   IF ( lngText1Len = 0 ) OR ( lngText2Len = 0 ) THEN FUNCTION =
> >   MAX%(
> > lngText1Len,
> > lngText2Len ) : EXIT FUNCTION
> >   IF ISFALSE( ISMISSING( lngCaseMatters )) AND ISFALSE(
> >   lngCaseMatters )
> > THEN
> >     strText1 = UCASE$( strText1 )
> >     strText2 = UCASE$( strText2 )
> >   END IF
> >   DIM lngMatrix( lngText1Len, lngText2Len ) AS LONG
> >   FOR lngText1Idx = 0 TO lngText1Len
> >     lngMatrix( lngText1Idx, 0 ) = lngText1Idx
> >   NEXT lngText1Idx
> >   FOR lngText2Idx = 0 TO lngText2Len
> >     lngMatrix( 0, lngText2Idx ) = lngText2Idx
> >   NEXT lngText2Idx
> >   pbytText1Char = STRPTR( strText1 )
> >   pbytText2Char = STRPTR( strText2 )
> >   FOR lngText1Idx = 1 TO lngText1Len
> >     FOR lngText2Idx = 1 TO lngText2Len
> >       lngMatrix( lngText1Idx, lngText2Idx ) = _
> >         MIN%( lngMatrix( lngText1Idx - 1, lngText2Idx     ) + 1, _
> >           lngMatrix( lngText1Idx,     lngText2Idx - 1 ) + 1, _
> >           lngMatrix( lngText1Idx - 1, lngText2Idx - 1 ) + _
> >           ABS( @pbytText1Char[ lngText1Idx - 1 ] <> @pbytText2Char[
> > lngText2Idx - 1 ] ))
> >     NEXT lngText2Idx
> >   NEXT lngText1Idx
> >   FUNCTION = lngMatrix( lngText1Len, lngText2Len )
> > END FUNCTION
> >
> >
> >
> > --
> > AccessD mailing list
> > AccessD at databaseadvisors.com
> > https://databaseadvisors.com/mailman/listinfo/accessd
> > Website: http://www.databaseadvisors.com
> >
> 
> 
> -- 
> John W. Colby
> Colby Consulting
> -- 
> AccessD mailing list
> AccessD at databaseadvisors.com
> https://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com




More information about the AccessD mailing list