[AccessD] MD5 Hash (was: First in a group)

Gustav Brock Gustav at cactus.dk
Fri Sep 14 10:04:11 CDT 2007


Hi Jim and Max

I happened to rewrite it to VBA though I don't recall how carefully I tested it.

Here's the main module. I'll post the class separately:

Option Compare Database
Option Explicit

Public Function MD5Hash( _
  ByVal strText As String) _
  As String

' Create and return MD5 signature from strText.
' Signature has a length of 32 characters.
'
' 2005-11-21. Cactus Data ApS, CPH.

  Dim cMD5          As New clsMD5
  Dim strSignature  As String
  
  ' Calculate MD5 hash.
  strSignature = cMD5.MD5(strText)
  
  ' Return MD5 signature.
  MD5Hash = strSignature
  
  Set cMD5 = Nothing
  
End Function

Public Function IsMD5( _
  ByVal strText As String, _
  ByVal strMD5 As String) _
  As Boolean
  
' Checks if strMD5 is the MD5 signature of strText.
' Returns True if they match.
' Note: strText is case sensitive while strMD5 is not.
'
' 2005-11-21. Cactus Data ApS, CPH.

  Dim booMatch  As Boolean
  
  booMatch = (StrComp(strMD5, MD5Hash(strText), vbTextCompare) = 0)
  IsMD5 = booMatch
  
End Function

/gustav

>>> jimdettman at verizon.net 14-09-2007 16:41 >>>
Max,

  Think it was gustav that posted a link in the past to a MD5 and SHA hash.
I downloaded at the time and I still have them, but they are VB6 files.

  If you have VB6, I'll send them to you.

Jim. 

-----Original Message-----
From: accessd-bounces at databaseadvisors.com 
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Gmail
Sent: Friday, September 14, 2007 10:20 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] First in a group

Is there a similar function that somebody could post which would do an M5 hash from VBA please?
Thanks
Max
 

-----Original Message-----
From: accessd-bounces at databaseadvisors.com 
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Friday, September 14, 2007 4:58 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] First in a group

Borge,

The hash was done in SQL Server.  There is a Hash function inside of SQL
Server that I used.  It has various "brands" of hash and I chose one (sha1),
then just ran all of the address fields into the hash function and out pops
a binary (~160 bit?) hash number.

UPDATE [PSM_DPSI11000].[dbo].[tblData]
   SET [HashAddr] = hashbytes('sha1', Addr + Zip5 + Zip4)
      ,[HashFamily] = hashbytes('sha1', Addr + Zip5 + Zip4 + LName)
      ,[HashPerson] = hashbytes('sha1', Addr + Zip5 + Zip4 + LName + FName)

Google HashBytes for more info on that function.

I hash each of these strings and store the three values for every person
record.  I can then compare the hash values to determine a fact collision -
same address, same family, same person.  Obviously this does not handle
mis-spellings, that is not the intent here.  Having hashes I can do compares
or even joins on the hash columns.  I index the columns as well.  It is just
simpler (and faster, and easier) than doing a multi-column join to identify
matches.  

The hash value is an 20 byte value so I think / hope / pray it becomes more
efficient than a multi-column index as well.  In the address case it is a
marginal call.  Zip5 + zip4 is nine bytes plus the average of Address but on
average a 20 byte hash wins I think.  I was previously using X bytes from
the address plus zip5 plus zip 4 but that left me open to problems with
instances like:

1) 1723 Twin Pines Apt A.
2) 1723 Twin Pines Apt B.
3) 1722 Twin Pines


If I took the leading 6 characters (for example) then the examples 1 and 2
match and if I took the last 6 characters then examples 2 and 3 match.  A
hash simply seemed superior.  Once you throw in the first and last name as
well then the hash unquestionably wins.  What really helps it all work is
the fact that I do address validation such that I know that each address is
EXACTLY valid (is actually deliverable) and just throw away any that are not
deliverable so I do not have to deal with mis-spellings in the address part
at least.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: accessd-bounces at databaseadvisors.com 
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of pcs at azizaz.com 
Sent: Thursday, September 13, 2007 10:30 PM
To: Access Developers discussion and problemsolving
Subject: [AccessD] First in a group

John Colby wrote:

...
I do have an address hash that generates a hash string for just the address
part.

So it seems like something like a group by address hash and then a top1 is
the right direction, but of course it has to be the TOP 1 in each GROUP.
...

I am assuming that the result of an address hash is a unique numeric value
for the particual address text string being 'hashed'? ....

I am interested in knowing more about the address hash .... 
a function ?

A pointer or link to an example will do, thanks

Regards
borge





More information about the AccessD mailing list