[AccessD] Fuzzy Matching (Like Soundex) or other ideas?
John Colby
jwcolby at gmail.com
Sun Apr 23 15:45:29 CDT 2023
Per ChatGPT:
It is not working for me. Just returns zero every time. However I had to
replace the
WorksheetFunction.Min
With an implementation I have.
Here's an implementation of the Damerau-Levenshtein distance algorithm in
VBA:
scss
Function DamerauLevenshteinDistance(s As String, t As String) As Integer
Dim m As Integer, n As Integer, i As Integer, j As Integer
Dim cost As Integer, substitutionCost As Integer
Dim d() As Integer, swap() As Integer
m = Len(s)
n = Len(t)
ReDim d(0 To m, 0 To n)
ReDim swap(0 To m)
For i = 0 To m
d(i, 0) = i
Next i
For j = 0 To n
d(0, j) = j
Next j
For i = 1 To m
For j = 1 To n
cost = IIf(Mid(s, i, 1) = Mid(t, j, 1), 0, 1)
substitutionCost = d(i - 1, j - 1) + cost
d(i, j) = WorksheetFunction.Min(substitutionCost, d(i - 1,
j) + 1, d(i, j - 1) + 1)
If i > 1 And j > 1 And Mid(s, i, 1) = Mid(t, j - 1, 1) And
Mid(s, i - 1, 1) = Mid(t, j, 1) Then
d(i, j) = WorksheetFunction.Min(d(i, j), d(i - 2, j - 2) + cost)
End If
Next j
Next i
DamerauLevenshteinDistance = d(m, n)
End Function
To use this function, you can call it with two string arguments, like this:
mathematica
Dim distance As Integerdistance = DamerauLevenshteinDistance("kitten",
"sitting")Debug.Print distance ' Outputs: 3
This implementation follows the pseudocode of the Damerau-Levenshtein
algorithm, which computes the minimum number of single-character
insertions, deletions, substitutions, and transpositions required to
transform one string into another. The algorithm uses a dynamic programming
approach to build a table of distances between all substrings of the two
input strings, and then returns the distance between the full strings.
On Sat, Apr 22, 2023 at 1:41 AM Stuart McLachlan <stuart at lexacorp.com.pg>
wrote:
> On 22 Apr 2023 at 15:25, Stuart McLachlan wrote:
>
> > Oops, it's a long time since I did it.
> >
> > If the strings are different lengths, it's the levenshtein distance :)
> >
> >
> Or you could go a step further with Damerau-Levenshtein distance
> https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
>
> (The Damerau-Levenshtein distance differs from the classical Levenshtein
> distance
> by including transpositions among its allowable operations in addition to
> the three
> classical single-character edit operations (insertions, deletions and
> substitutions) )
>
>
> But for plain Lenenschtein, Here's some very old PB code, it should be
> trivial to adapt to VBA
>
> FUNCTION fn_LevenshteinDistance( BYVAL strText1 AS STRING, BYVAL strText2
> AS
> STRING, OPT lngCaseMatters AS LONG ) AS LONG
> LOCAL lngText1Idx AS LONG
> LOCAL lngText1Len AS LONG
> LOCAL lngText2Idx AS LONG
> LOCAL lngText2Len AS LONG
> LOCAL pbytText1Char AS BYTE PTR
> LOCAL pbytText2Char AS BYTE PTR
> lngText1Len = LEN( strText1 )
> lngText2Len = LEN( strText2 )
> IF ( lngText1Len = 0 ) OR ( lngText2Len = 0 ) THEN FUNCTION = MAX%(
> lngText1Len,
> lngText2Len ) : EXIT FUNCTION
> IF ISFALSE( ISMISSING( lngCaseMatters )) AND ISFALSE( lngCaseMatters )
> THEN
> strText1 = UCASE$( strText1 )
> strText2 = UCASE$( strText2 )
> END IF
> DIM lngMatrix( lngText1Len, lngText2Len ) AS LONG
> FOR lngText1Idx = 0 TO lngText1Len
> lngMatrix( lngText1Idx, 0 ) = lngText1Idx
> NEXT lngText1Idx
> FOR lngText2Idx = 0 TO lngText2Len
> lngMatrix( 0, lngText2Idx ) = lngText2Idx
> NEXT lngText2Idx
> pbytText1Char = STRPTR( strText1 )
> pbytText2Char = STRPTR( strText2 )
> FOR lngText1Idx = 1 TO lngText1Len
> FOR lngText2Idx = 1 TO lngText2Len
> lngMatrix( lngText1Idx, lngText2Idx ) = _
> MIN%( lngMatrix( lngText1Idx - 1, lngText2Idx ) + 1, _
> lngMatrix( lngText1Idx, lngText2Idx - 1 ) + 1, _
> lngMatrix( lngText1Idx - 1, lngText2Idx - 1 ) + _
> ABS( @pbytText1Char[ lngText1Idx - 1 ] <> @pbytText2Char[
> lngText2Idx - 1 ] ))
> NEXT lngText2Idx
> NEXT lngText1Idx
> FUNCTION = lngMatrix( lngText1Len, lngText2Len )
> END FUNCTION
>
>
>
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> https://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
>
--
John W. Colby
Colby Consulting
More information about the AccessD
mailing list