[AccessD] Fuzzy Matching (Like Soundex) or other ideas?
Stuart McLachlan
stuart at lexacorp.com.pg
Sat Apr 22 00:21:41 CDT 2023
Hamming distance? (Although that is usually for equal length strings)
I've got some code archived somewhere that I will try to track down.
On 21 Apr 2023 at 19:26, Ryan W wrote:
> Does anyone have any fuzzy matching routines that would have fuzzy
> matching logic based on a matching percentage or something else?
>
> Example:
>
> Client sends us a hand written chain of custody, they list something
> we look at as: "MW-14" so we enter it as MW-14
>
> On a set of forms they provide later for us to use reporting purposes,
> they called it MW14.
>
> So now we have data that doesn't precisely match. The usual fix is
> for us to ask the client which one is right, and fix the incorrect
> one. Sometimes it's the hand written form (and then our database,
> because we relied on the hand written form to start the work),
> sometimes it's the data entry forms they provided (You figure they'd
> get this right....).
>
> I'm trying to make it so when they send us the data to digest toward
> the end of the job, I can pull that data in and it'll match MW-14 with
> MW14, or vice versa.
>
> While SoundEx works for MW-14 vs MW14, it also thinks.. .. as an
> example that MW-104 matches as well. (or something similar as an
> example).
>
> I'm not even sure if a percentage match would be enough since MW-14,
> MW14 and MW-104 are all a really tight grouping of "like" characters.
> MW-104 would be unrelated to MW-14 or MW14... so erroneously matching
> it would cause more grief than us just hand patching the IDs before we
> button up the job. -- AccessD mailing list
> AccessD at databaseadvisors.com
> https://databaseadvisors.com/mailman/listinfo/accessd Website:
> http://www.databaseadvisors.com
>
More information about the AccessD
mailing list