[AccessD] First in a group

Gmail max.wanadoo at gmail.com
Fri Sep 14 13:55:04 CDT 2007


No worries, Jim.
The code that Gustav posted worked a dream
Thanks
Max 

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Jim Dettman
Sent: Friday, September 14, 2007 3:42 PM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] First in a group

Max,

  Think it was gustav that posted a link in the past to a MD5 and SHA hash.
I downloaded at the time and I still have them, but they are VB6 files.

  If you have VB6, I'll send them to you.

Jim. 

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Gmail
Sent: Friday, September 14, 2007 10:20 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] First in a group

Is there a similar function that somebody could post which would do an M5
hash from VBA please?
Thanks
Max
 

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Friday, September 14, 2007 4:58 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] First in a group

Borge,

The hash was done in SQL Server.  There is a Hash function inside of SQL
Server that I used.  It has various "brands" of hash and I chose one (sha1),
then just ran all of the address fields into the hash function and out pops
a binary (~160 bit?) hash number.

UPDATE [PSM_DPSI11000].[dbo].[tblData]
   SET [HashAddr] = hashbytes('sha1', Addr + Zip5 + Zip4)
      ,[HashFamily] = hashbytes('sha1', Addr + Zip5 + Zip4 + LName)
      ,[HashPerson] = hashbytes('sha1', Addr + Zip5 + Zip4 + LName + FName)

Google HashBytes for more info on that function.

I hash each of these strings and store the three values for every person
record.  I can then compare the hash values to determine a fact collision -
same address, same family, same person.  Obviously this does not handle
mis-spellings, that is not the intent here.  Having hashes I can do compares
or even joins on the hash columns.  I index the columns as well.  It is just
simpler (and faster, and easier) than doing a multi-column join to identify
matches.  

The hash value is an 20 byte value so I think / hope / pray it becomes more
efficient than a multi-column index as well.  In the address case it is a
marginal call.  Zip5 + zip4 is nine bytes plus the average of Address but on
average a 20 byte hash wins I think.  I was previously using X bytes from
the address plus zip5 plus zip 4 but that left me open to problems with
instances like:

1) 1723 Twin Pines Apt A.
2) 1723 Twin Pines Apt B.
3) 1722 Twin Pines


If I took the leading 6 characters (for example) then the examples 1 and 2
match and if I took the last 6 characters then examples 2 and 3 match.  A
hash simply seemed superior.  Once you throw in the first and last name as
well then the hash unquestionably wins.  What really helps it all work is
the fact that I do address validation such that I know that each address is
EXACTLY valid (is actually deliverable) and just throw away any that are not
deliverable so I do not have to deal with mis-spellings in the address part
at least.

John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of pcs at azizaz.com
Sent: Thursday, September 13, 2007 10:30 PM
To: Access Developers discussion and problemsolving
Subject: [AccessD] First in a group

John Colby wrote:

...
I do have an address hash that generates a hash string for just the address
part.

So it seems like something like a group by address hash and then a top1 is
the right direction, but of course it has to be the TOP 1 in each GROUP.
...

I am assuming that the result of an address hash is a unique numeric value
for the particual address text string being 'hashed'? ....

I am interested in knowing more about the address hash .... 
a function ?

A pointer or link to an example will do, thanks

Regards
borge
--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com




More information about the AccessD mailing list