[AccessD] Fuzzy Matching (Like Soundex) or other ideas?
Ryan W
wrwehler at gmail.com
Fri Apr 21 20:25:58 CDT 2023
I'll have to check on Monday but I want to say there are other possible
alliterations between the handwritten chain and the forms data provided.
Such as Blank01 or Blank1 or DUP01 versus 01DUP.
I think that's why I had considered having some sort of matching based on a
high percentage of confidence that the two fields are synonymous.
On Fri, Apr 21, 2023 at 8:10 PM Rocky Smolin <rockysmolin2 at gmail.com> wrote:
> Would stripping all the special characters out and matching just letters
> and numbers get you there? You'd probably need another field in the table
> where MW-14 or MW 14 appear, to hold the stripped down version of the data
> so the search would be fast.
>
> r
>
> On Fri, Apr 21, 2023 at 5:27 PM Ryan W <wrwehler at gmail.com> wrote:
>
> > Does anyone have any fuzzy matching routines that would have fuzzy
> matching
> > logic based on a matching percentage or something else?
> >
> > Example:
> >
> > Client sends us a hand written chain of custody, they list something we
> > look at as:
> > "MW-14" so we enter it as MW-14
> >
> > On a set of forms they provide later for us to use reporting purposes,
> they
> > called it MW14.
> >
> > So now we have data that doesn't precisely match. The usual fix is for
> us
> > to ask the client which one is right, and fix the incorrect one.
> Sometimes
> > it's the hand written form (and then our database, because we relied on
> the
> > hand written form to start the work), sometimes it's the data entry forms
> > they provided (You figure they'd get this right....).
> >
> > I'm trying to make it so when they send us the data to digest toward the
> > end of the job, I can pull that data in and it'll match MW-14 with MW14,
> or
> > vice versa.
> >
> > While SoundEx works for MW-14 vs MW14, it also thinks.. .. as an example
> > that MW-104 matches as well. (or something similar as an example).
> >
> > I'm not even sure if a percentage match would be enough since MW-14, MW14
> > and MW-104 are all a really tight grouping of "like" characters. MW-104
> > would be unrelated to MW-14 or MW14... so erroneously matching it would
> > cause more grief than us just hand patching the IDs before we button up
> the
> > job.
> > --
> > AccessD mailing list
> > AccessD at databaseadvisors.com
> > https://databaseadvisors.com/mailman/listinfo/accessd
> > Website: http://www.databaseadvisors.com
> >
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> https://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
>
More information about the AccessD
mailing list