[dba-VB] SHA1 to compute a hash

Stuart McLachlan stuart at lexacorp.com.pg
Sat Mar 19 05:20:16 CDT 2011


How are you creating your hash?

Can you post a few examples of different  data strings and colliding SHA1 hashes.   I can 
probably make a lot of money out of them.   AFAIK, no one other than you has found any.

-- 
Stuart
 

On 18 Mar 2011 at 22:08, jwcolby wrote:

> In my databases I create SHA1 hashes to enable joining between tables
> and pull identical records (identical for the fields hashed).  I
> create:
> 
> 1) A HashAddr of the zip5, zip4 and addr.  IOW I simply append the
> three values and feed them into SHa1 and out pops a number which I
> store in a field in my table.
> 
> 2) A HashFamily of the Zip5, Zip4, Addr ad LName.
> 
> 3) A HashPerson of Zip5, Zip4, Addr, LName and FName.
> 
> I am getting known collisions between different addresses (I have
> discovered and investigated collisions) in my HashAddr when I have
> many millions of addresses.  I need to address this.
> 
> Back when I made my design decisions (2004) my hardware consisted of
> single core processors, 4 gigs ram, Windows x32 etc.  Now I have 8
> cores, 32 gigs Ram, Windows X64 etc.  IOW I was to a great extent
> constrained by my hardware "back in the day" whereas I am much less so
> now.
> 
> I am about to redesign my process.
> 
> I am considering simply appending in the city and state strings to all
> of the inputs: Addr, City, St, Zip5, Zip4 as the address base and then
> the same with LName and FName for the other two respective hashes.
> 
> The objective is to minimize hash collisions, not prevent some crypto
> attack.  I use these hash fields to join between multi million record
> tables so If I need to discover info in TableA where the HashAddr is
> the same as in TableB, I need the probability of a collision between
> different addresses (family/Person) to be as close to zero as I can
> get it.
> 
> My questions are:
> 
> 1) Whether anyone out there is using a hash in this manner?
> 2) Has anyone seen a table of collision probability between messages
> of a given (short) message length.  My message is 9 digits for the
> zip5/4 and the address could be something as short as PO Box 1, or Apt
> 1.  IOW the total message length of 14 is pretty common.  Adding the
> state would give me minimum message lengths of only 16 and City would
> only add a few more characters. 3) Does anyone know if just adding the
> same data back in again would decrease the collision probability. IOW
> Zip5,Zip4,Addr,City,St,Zip5,Zip4Etc.
> 
> Any experience out there?
> 
> 
> -- 
> John W. Colby
> www.ColbyConsulting.com
> _______________________________________________
> dba-VB mailing list
> dba-VB at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-vb
> http://www.databaseadvisors.com
> 
> 






More information about the dba-VB mailing list