[AccessD] Fuzzy Matching (Like Soundex) or other ideas?

Stuart McLachlan stuart at lexacorp.com.pg
Sat Apr 22 00:21:41 CDT 2023


Hamming distance? (Although that is usually for equal length strings)
I've got some code archived somewhere that I will try to track down.


On 21 Apr 2023 at 19:26, Ryan W wrote:

> Does anyone have any fuzzy matching routines that would have fuzzy
> matching logic based on a matching percentage or something else?
> 
> Example:
> 
> Client sends us a hand written chain of custody, they list something
> we look at as: "MW-14" so we enter it as MW-14
> 
> On a set of forms they provide later for us to use reporting purposes,
> they called it MW14.
> 
> So now we have data that doesn't precisely match.  The usual fix is
> for us to ask the client which one is right, and fix the incorrect
> one.  Sometimes it's the hand written form (and then our database,
> because we relied on the hand written form to start the work),
> sometimes it's the data entry forms they provided (You figure they'd
> get this right....).
> 
> I'm trying to make it so when they send us the data to digest toward
> the end of the job, I can pull that data in and it'll match MW-14 with
> MW14, or vice versa.
> 
> While SoundEx works for MW-14 vs MW14, it also thinks.. .. as an
> example that MW-104 matches as well. (or something similar as an
> example).
> 
> I'm not even sure if a percentage match would be enough since MW-14,
> MW14 and MW-104 are all a really tight grouping of "like" characters. 
> MW-104 would be unrelated to MW-14 or MW14... so erroneously matching
> it would cause more grief than us just hand patching the IDs before we
> button up the job. -- AccessD mailing list
> AccessD at databaseadvisors.com
> https://databaseadvisors.com/mailman/listinfo/accessd Website:
> http://www.databaseadvisors.com
> 




More information about the AccessD mailing list