Gmail
max.wanadoo at gmail.com
Fri Sep 14 13:55:04 CDT 2007
No worries, Jim.
The code that Gustav posted worked a dream
Thanks
Max
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Jim Dettman
Sent: Friday, September 14, 2007 3:42 PM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] First in a group
Max,
Think it was gustav that posted a link in the past to a MD5 and SHA hash.
I downloaded at the time and I still have them, but they are VB6 files.
If you have VB6, I'll send them to you.
Jim.
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Gmail
Sent: Friday, September 14, 2007 10:20 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] First in a group
Is there a similar function that somebody could post which would do an M5
hash from VBA please?
Thanks
Max
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Friday, September 14, 2007 4:58 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] First in a group
Borge,
The hash was done in SQL Server. There is a Hash function inside of SQL
Server that I used. It has various "brands" of hash and I chose one (sha1),
then just ran all of the address fields into the hash function and out pops
a binary (~160 bit?) hash number.
UPDATE [PSM_DPSI11000].[dbo].[tblData]
SET [HashAddr] = hashbytes('sha1', Addr + Zip5 + Zip4)
,[HashFamily] = hashbytes('sha1', Addr + Zip5 + Zip4 + LName)
,[HashPerson] = hashbytes('sha1', Addr + Zip5 + Zip4 + LName + FName)
Google HashBytes for more info on that function.
I hash each of these strings and store the three values for every person
record. I can then compare the hash values to determine a fact collision -
same address, same family, same person. Obviously this does not handle
mis-spellings, that is not the intent here. Having hashes I can do compares
or even joins on the hash columns. I index the columns as well. It is just
simpler (and faster, and easier) than doing a multi-column join to identify
matches.
The hash value is an 20 byte value so I think / hope / pray it becomes more
efficient than a multi-column index as well. In the address case it is a
marginal call. Zip5 + zip4 is nine bytes plus the average of Address but on
average a 20 byte hash wins I think. I was previously using X bytes from
the address plus zip5 plus zip 4 but that left me open to problems with
instances like:
1) 1723 Twin Pines Apt A.
2) 1723 Twin Pines Apt B.
3) 1722 Twin Pines
If I took the leading 6 characters (for example) then the examples 1 and 2
match and if I took the last 6 characters then examples 2 and 3 match. A
hash simply seemed superior. Once you throw in the first and last name as
well then the hash unquestionably wins. What really helps it all work is
the fact that I do address validation such that I know that each address is
EXACTLY valid (is actually deliverable) and just throw away any that are not
deliverable so I do not have to deal with mis-spellings in the address part
at least.
John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of pcs at azizaz.com
Sent: Thursday, September 13, 2007 10:30 PM
To: Access Developers discussion and problemsolving
Subject: [AccessD] First in a group
John Colby wrote:
...
I do have an address hash that generates a hash string for just the address
part.
So it seems like something like a group by address hash and then a top1 is
the right direction, but of course it has to be the TOP 1 in each GROUP.
...
I am assuming that the result of an address hash is a unique numeric value
for the particual address text string being 'hashed'? ....
I am interested in knowing more about the address hash ....
a function ?
A pointer or link to an example will do, thanks
Regards
borge
--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com
--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com
--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com
--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com