Stuart McLachlan
stuart at lexacorp.com.pg
Sat Mar 19 06:20:26 CDT 2011
Using your SHA1 function, what message digests do you get for the standard test cases: 1. abc 2. abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq These should return: 1. A9993E36 4706816A BA3E2571 7850C26C 9CD0D89D 2. 84983E44 1C3BD26E BAAE4AA1 F95129E5 E54670F1 See Examples 1 and 2 at http://www.itl.nist.gov/fipspubs/fip180-1.htm -- Stuart On 19 Mar 2011 at 20:20, Stuart McLachlan wrote: > How are you creating your hash? > > Can you post a few examples of different data strings and colliding > SHA1 hashes. I can probably make a lot of money out of them. > AFAIK, no one other than you has found any. > > -- > Stuart > > > On 18 Mar 2011 at 22:08, jwcolby wrote: > > > In my databases I create SHA1 hashes to enable joining between > > tables and pull identical records (identical for the fields hashed). > > I create: > > > > 1) A HashAddr of the zip5, zip4 and addr. IOW I simply append the > > three values and feed them into SHa1 and out pops a number which I > > store in a field in my table. > > > > 2) A HashFamily of the Zip5, Zip4, Addr ad LName. > > > > 3) A HashPerson of Zip5, Zip4, Addr, LName and FName. > > > > I am getting known collisions between different addresses (I have > > discovered and investigated collisions) in my HashAddr when I have > > many millions of addresses. I need to address this. > > > > Back when I made my design decisions (2004) my hardware consisted of > > single core processors, 4 gigs ram, Windows x32 etc. Now I have 8 > > cores, 32 gigs Ram, Windows X64 etc. IOW I was to a great extent > > constrained by my hardware "back in the day" whereas I am much less > > so now. > > > > I am about to redesign my process. > > > > I am considering simply appending in the city and state strings to > > all of the inputs: Addr, City, St, Zip5, Zip4 as the address base > > and then the same with LName and FName for the other two respective > > hashes. > > > > The objective is to minimize hash collisions, not prevent some > > crypto attack. I use these hash fields to join between multi > > million record tables so If I need to discover info in TableA where > > the HashAddr is the same as in TableB, I need the probability of a > > collision between different addresses (family/Person) to be as close > > to zero as I can get it. > > > > My questions are: > > > > 1) Whether anyone out there is using a hash in this manner? > > 2) Has anyone seen a table of collision probability between messages > > of a given (short) message length. My message is 9 digits for the > > zip5/4 and the address could be something as short as PO Box 1, or > > Apt 1. IOW the total message length of 14 is pretty common. Adding > > the state would give me minimum message lengths of only 16 and City > > would only add a few more characters. 3) Does anyone know if just > > adding the same data back in again would decrease the collision > > probability. IOW Zip5,Zip4,Addr,City,St,Zip5,Zip4Etc. > > > > Any experience out there? > > > > > > -- > > John W. Colby > > www.ColbyConsulting.com > > _______________________________________________ > > dba-VB mailing list > > dba-VB at databaseadvisors.com > > http://databaseadvisors.com/mailman/listinfo/dba-vb > > http://www.databaseadvisors.com > > > > > > > > _______________________________________________ > dba-VB mailing list > dba-VB at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-vb > http://www.databaseadvisors.com > >