[AccessD] Fuzzy Matching (Like Soundex) or other ideas?
Stuart McLachlan
stuart at lexacorp.com.pg
Sun Apr 23 16:22:24 CDT 2023
Yep ChatGPT generally returns buggy code that requires cleaning up. :)
Problem. VBA evaluates every condition in an If ... Then
even if an early condition fails unlike some other languages which "short circuit" evaluation as
soon as a condition fails (PB anyone?).
So it still try's to evaluates mid(s,i-1,1) even though i = 1. Which obviously throws an error
(i.e. Mid$(s,0,1) )
The followng code works for me and returns the value of 3
Option Compare Database
Option Explicit
Function DamerauLevenshteinDistance(s As String, t As String) As Long
Dim m As Long, n As Long, i As Long, j As Long
Dim cost As Long, substitutionCost As Long
Dim d() As Long, swap() As Long
m = Len(s)
n = Len(t)
ReDim d(0 To m, 0 To n)
ReDim swap(0 To m)
For i = 0 To m
d(i, 0) = i
Next i
For j = 0 To n
d(0, j) = j
Next j
For i = 1 To m
For j = 1 To n
cost = IIf(Mid(s, i, 1) = Mid(t, j, 1), 0, 1)
substitutionCost = d(i - 1, j - 1) + cost
d(i, j) = minlng(substitutionCost, minlng(d(i - 1, j) + 1, d(i, j - 1) + 1))
If i > 1 And j > 1 Then
If Mid(s, i, 1) = Mid(t, j - 1, 1) And Mid(s, i - 1, 1) = Mid(t, j, 1) Then
d(i, j) = minlng(d(i, j), d(i - 2, j - 2) + cost)
End If
End If
Next j
Next i
DamerauLevenshteinDistance = d(m, n)
End Function
Function minlng(l1 As Long, l2 As Long) As Long
If l1 <= l2 Then
minlng = l1
Else
minlng = l2
End If
End Function
Function TestDL() As Long
Dim distance As Long
distance = DamerauLevenshteinDistance("kitten", "sitting")
Debug.Print distance ' Outputs: 3
End Function
On 23 Apr 2023 at 16:45, John Colby wrote:
> Per ChatGPT:
>
> It is not working for me. Just returns zero every time. However I
> had to replace the
>
> WorksheetFunction.Min
>
> With an implementation I have.
>
> Here's an implementation of the Damerau-Levenshtein distance algorithm
> in VBA:
>
> scss
>
> Function DamerauLevenshteinDistance(s As String, t As String) As
> Integer
> Dim m As Integer, n As Integer, i As Integer, j As Integer
> Dim cost As Integer, substitutionCost As Integer
> Dim d() As Integer, swap() As Integer
>
> m = Len(s)
> n = Len(t)
>
> ReDim d(0 To m, 0 To n)
> ReDim swap(0 To m)
>
> For i = 0 To m
> d(i, 0) = i
> Next i
>
> For j = 0 To n
> d(0, j) = j
> Next j
>
> For i = 1 To m
> For j = 1 To n
> cost = IIf(Mid(s, i, 1) = Mid(t, j, 1), 0, 1)
> substitutionCost = d(i - 1, j - 1) + cost
>
> d(i, j) = WorksheetFunction.Min(substitutionCost, d(i - 1,
> j) + 1, d(i, j - 1) + 1)
>
> If i > 1 And j > 1 And Mid(s, i, 1) = Mid(t, j - 1, 1) And
> Mid(s, i - 1, 1) = Mid(t, j, 1) Then
> d(i, j) = WorksheetFunction.Min(d(i, j), d(i - 2, j -
> 2) + cost)
> End If
> Next j
> Next i
>
> DamerauLevenshteinDistance = d(m, n)
> End Function
>
> To use this function, you can call it with two string arguments, like
> this:
>
> mathematica
>
> Dim distance As Integerdistance = DamerauLevenshteinDistance("kitten",
> "sitting")Debug.Print distance ' Outputs: 3
>
> This implementation follows the pseudocode of the Damerau-Levenshtein
> algorithm, which computes the minimum number of single-character
> insertions, deletions, substitutions, and transpositions required to
> transform one string into another. The algorithm uses a dynamic
> programming approach to build a table of distances between all
> substrings of the two input strings, and then returns the distance
> between the full strings.
>
> On Sat, Apr 22, 2023 at 1:41AM Stuart McLachlan
> <stuart at lexacorp.com.pg> wrote:
>
> > On 22 Apr 2023 at 15:25, Stuart McLachlan wrote:
> >
> > > Oops, it's a long time since I did it.
> > >
> > > If the strings are different lengths, it's the levenshtein
> > > distance :)
> > >
> > >
> > Or you could go a step further with Damerau-Levenshtein distance
> > https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
> >
> > (The Damerau-Levenshtein distance differs from the classical
> > Levenshtein distance by including transpositions among its allowable
> > operations in addition to the three classical single-character edit
> > operations (insertions, deletions and substitutions) )
> >
> >
> > But for plain Lenenschtein, Here's some very old PB code, it should
> > be trivial to adapt to VBA
> >
> > FUNCTION fn_LevenshteinDistance( BYVAL strText1 AS STRING, BYVAL
> > strText2 AS STRING, OPT lngCaseMatters AS LONG ) AS LONG
> > LOCAL lngText1Idx AS LONG
> > LOCAL lngText1Len AS LONG
> > LOCAL lngText2Idx AS LONG
> > LOCAL lngText2Len AS LONG
> > LOCAL pbytText1Char AS BYTE PTR
> > LOCAL pbytText2Char AS BYTE PTR
> > lngText1Len = LEN( strText1 )
> > lngText2Len = LEN( strText2 )
> > IF ( lngText1Len = 0 ) OR ( lngText2Len = 0 ) THEN FUNCTION =
> > MAX%(
> > lngText1Len,
> > lngText2Len ) : EXIT FUNCTION
> > IF ISFALSE( ISMISSING( lngCaseMatters )) AND ISFALSE(
> > lngCaseMatters )
> > THEN
> > strText1 = UCASE$( strText1 )
> > strText2 = UCASE$( strText2 )
> > END IF
> > DIM lngMatrix( lngText1Len, lngText2Len ) AS LONG
> > FOR lngText1Idx = 0 TO lngText1Len
> > lngMatrix( lngText1Idx, 0 ) = lngText1Idx
> > NEXT lngText1Idx
> > FOR lngText2Idx = 0 TO lngText2Len
> > lngMatrix( 0, lngText2Idx ) = lngText2Idx
> > NEXT lngText2Idx
> > pbytText1Char = STRPTR( strText1 )
> > pbytText2Char = STRPTR( strText2 )
> > FOR lngText1Idx = 1 TO lngText1Len
> > FOR lngText2Idx = 1 TO lngText2Len
> > lngMatrix( lngText1Idx, lngText2Idx ) = _
> > MIN%( lngMatrix( lngText1Idx - 1, lngText2Idx ) + 1, _
> > lngMatrix( lngText1Idx, lngText2Idx - 1 ) + 1, _
> > lngMatrix( lngText1Idx - 1, lngText2Idx - 1 ) + _
> > ABS( @pbytText1Char[ lngText1Idx - 1 ] <> @pbytText2Char[
> > lngText2Idx - 1 ] ))
> > NEXT lngText2Idx
> > NEXT lngText1Idx
> > FUNCTION = lngMatrix( lngText1Len, lngText2Len )
> > END FUNCTION
> >
> >
> >
> > --
> > AccessD mailing list
> > AccessD at databaseadvisors.com
> > https://databaseadvisors.com/mailman/listinfo/accessd
> > Website: http://www.databaseadvisors.com
> >
>
>
> --
> John W. Colby
> Colby Consulting
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> https://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
More information about the AccessD
mailing list