Jim Dettman
jimdettman at verizon.net
Fri Sep 14 09:41:31 CDT 2007
Max, Think it was gustav that posted a link in the past to a MD5 and SHA hash. I downloaded at the time and I still have them, but they are VB6 files. If you have VB6, I'll send them to you. Jim. -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Gmail Sent: Friday, September 14, 2007 10:20 AM To: 'Access Developers discussion and problem solving' Subject: Re: [AccessD] First in a group Is there a similar function that somebody could post which would do an M5 hash from VBA please? Thanks Max -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Friday, September 14, 2007 4:58 AM To: 'Access Developers discussion and problem solving' Subject: Re: [AccessD] First in a group Borge, The hash was done in SQL Server. There is a Hash function inside of SQL Server that I used. It has various "brands" of hash and I chose one (sha1), then just ran all of the address fields into the hash function and out pops a binary (~160 bit?) hash number. UPDATE [PSM_DPSI11000].[dbo].[tblData] SET [HashAddr] = hashbytes('sha1', Addr + Zip5 + Zip4) ,[HashFamily] = hashbytes('sha1', Addr + Zip5 + Zip4 + LName) ,[HashPerson] = hashbytes('sha1', Addr + Zip5 + Zip4 + LName + FName) Google HashBytes for more info on that function. I hash each of these strings and store the three values for every person record. I can then compare the hash values to determine a fact collision - same address, same family, same person. Obviously this does not handle mis-spellings, that is not the intent here. Having hashes I can do compares or even joins on the hash columns. I index the columns as well. It is just simpler (and faster, and easier) than doing a multi-column join to identify matches. The hash value is an 20 byte value so I think / hope / pray it becomes more efficient than a multi-column index as well. In the address case it is a marginal call. Zip5 + zip4 is nine bytes plus the average of Address but on average a 20 byte hash wins I think. I was previously using X bytes from the address plus zip5 plus zip 4 but that left me open to problems with instances like: 1) 1723 Twin Pines Apt A. 2) 1723 Twin Pines Apt B. 3) 1722 Twin Pines If I took the leading 6 characters (for example) then the examples 1 and 2 match and if I took the last 6 characters then examples 2 and 3 match. A hash simply seemed superior. Once you throw in the first and last name as well then the hash unquestionably wins. What really helps it all work is the fact that I do address validation such that I know that each address is EXACTLY valid (is actually deliverable) and just throw away any that are not deliverable so I do not have to deal with mis-spellings in the address part at least. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of pcs at azizaz.com Sent: Thursday, September 13, 2007 10:30 PM To: Access Developers discussion and problemsolving Subject: [AccessD] First in a group John Colby wrote: ... I do have an address hash that generates a hash string for just the address part. So it seems like something like a group by address hash and then a top1 is the right direction, but of course it has to be the TOP 1 in each GROUP. ... I am assuming that the result of an address hash is a unique numeric value for the particual address text string being 'hashed'? .... I am interested in knowing more about the address hash .... a function ? A pointer or link to an example will do, thanks Regards borge -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com