[AccessD] Fuzzy Matching (Like Soundex) or other ideas?

Rocky Smolin rockysmolin2 at gmail.com
Fri Apr 21 20:43:38 CDT 2023


Tough one.  Usually in database applications an approximate match won't do
- you end up retrieving the wrong record. What's the spec that they've
given you?

r

On Fri, Apr 21, 2023 at 6:26 PM Ryan W <wrwehler at gmail.com> wrote:

> I'll have to check on Monday but I want to say there are other possible
> alliterations between the handwritten chain and the forms data provided.
>
> Such as Blank01 or Blank1 or DUP01 versus 01DUP.
>
>
> I think that's why I had considered having some sort of matching based on a
> high percentage of confidence that the two fields are synonymous.
>
>
>
> On Fri, Apr 21, 2023 at 8:10 PM Rocky Smolin <rockysmolin2 at gmail.com>
> wrote:
>
> > Would stripping all the special characters out and matching just letters
> > and numbers get you there? You'd probably need another field in the table
> > where MW-14 or MW 14 appear, to hold the stripped down version of the
> data
> > so the search would be fast.
> >
> > r
> >
> > On Fri, Apr 21, 2023 at 5:27 PM Ryan W <wrwehler at gmail.com> wrote:
> >
> > > Does anyone have any fuzzy matching routines that would have fuzzy
> > matching
> > > logic based on a matching percentage or something else?
> > >
> > > Example:
> > >
> > > Client sends us a hand written chain of custody, they list something we
> > > look at as:
> > > "MW-14" so we enter it as MW-14
> > >
> > > On a set of forms they provide later for us to use reporting purposes,
> > they
> > > called it MW14.
> > >
> > > So now we have data that doesn't precisely match.  The usual fix is for
> > us
> > > to ask the client which one is right, and fix the incorrect one.
> > Sometimes
> > > it's the hand written form (and then our database, because we relied on
> > the
> > > hand written form to start the work), sometimes it's the data entry
> forms
> > > they provided (You figure they'd get this right....).
> > >
> > > I'm trying to make it so when they send us the data to digest toward
> the
> > > end of the job, I can pull that data in and it'll match MW-14 with
> MW14,
> > or
> > > vice versa.
> > >
> > > While SoundEx works for MW-14 vs MW14, it also thinks.. .. as an
> example
> > > that MW-104 matches as well. (or something similar as an example).
> > >
> > > I'm not even sure if a percentage match would be enough since MW-14,
> MW14
> > > and MW-104 are all a really tight grouping of "like" characters.
> MW-104
> > > would be unrelated to MW-14 or MW14... so erroneously matching it would
> > > cause more grief than us just hand patching the IDs before we button up
> > the
> > > job.
> > > --
> > > AccessD mailing list
> > > AccessD at databaseadvisors.com
> > > https://databaseadvisors.com/mailman/listinfo/accessd
> > > Website: http://www.databaseadvisors.com
> > >
> > --
> > AccessD mailing list
> > AccessD at databaseadvisors.com
> > https://databaseadvisors.com/mailman/listinfo/accessd
> > Website: http://www.databaseadvisors.com
> >
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> https://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
>


More information about the AccessD mailing list