[AccessD] A2003: Checking for Similar Names and or Addresses

Fri May 15 09:26:22 CDT 2015

Jim Debtman I would add to a "collection" which if a duplicate would
produce an error.
But seriously that is one reason I like collections, they error out on
duplicates.
If you write a routine that compresses (trims and removes spaces) then adds
to a collection, then any duplicates (err.number<>0) means that the
duplicate should be written to a duplicates table.
You can also start instantiate and add to a counter collection for all
erroring items, with the compressed fullname as the key, and with each new
error (with error handling still set to On Error Resume Next), read any
current value for the countercollection based on the key, add 1 to it,
remove the item (if it's not there, this won't hurt), and add it back
incremented by 1 ... so that when you are done with this routine you have 1
collection which is a list of all duplicates, and a second collection which
is a series containing the number of duplicates per value. Then run a count
from 1 to col.count, and append to a table of duplicates that has 2
columns, the 1st comes from colNames and the 2nd comes from
colCounter(colNames(i))

Then run a query any time you want where the compressed name contatenation
from the data list equals a value in this table and you have all records
that duplicated.

Even with several thousand names to create and maintain this list of
duplicates would take seconds, not minutes.

On Thu, May 14, 2015 at 7:35 AM, Jim Dettman <jimdettman at verizon.net> wrote:

>
> Depends on what the goal is.   If it's only to check for exact matches,
> then
> a collapse of the strings involved (removing all white space) and
> concatenated against an index suffices.
>
> If it's more of a "fuzzy" type of analysis that you want to use, then there
> are various approaches, like soundex (i.e.  One name is "Jim Dettman" vs
> "Jim Debtman").
>
> Weighting would be throwing in comparing the different components, name vs
> address.
>
> On a straight compare though, I've always fully collapsed the strings, then
> do a find on an index.
>
> Jim.
>
> -----Original Message-----
> From: AccessD [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of
> Darren
> Sent: Wednesday, May 13, 2015 11:47 PM
> To: 'Access Developers discussion and problem solving'
> Subject: [AccessD] A2003: Checking for Similar Names and or Addresses
>
> Hi Team
> I appreciate this may trigger a right vs. wrong discussion. All comments
> appreciated but this is for a Pro-Bono Project so there are no funds and my
> time is not infinite.
> So a perfect solution is not required. Just a working one :-)
> So...
> I have inherited a dB with approx 12K names and addresses for a community
> project I am working on.
> Data entry has been on home built Access dBs and Excel spreadsheets and
> data
> veracity is poor.
> I can easily identify nearly 1000 records with same first and last names so
> there appear to be a lot of duplicate entries.
> We will handle that and will sanitise the old data. However does anyone
> have
> a quick and easy "test" where I can quickly check completed names (First
> and
> Last) once entered, to see if we have exact or similar matches already in
> the dB?
> I really would like the same thing with addresses?
> I have Googled and I have seen some very complex weighting routines etc -
> Too tricky for me and beyond the scope of this project (and my skill set).
> I just need to present the users with a list of exact matches (I can do
> that
> now) and some potentials, in a list that they can then decide if it's ok to
> proceed or not.
> Many thanks in advance.
> Darren.
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
>
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
>