Stuart McLachlan
stuart at lexacorp.com.pg
Sat Mar 19 08:20:11 CDT 2011
If he is compacting a 160 bit hash to a 64 bit hash, he is certainly increasing the odds of a collision - but I suspect that even at 64 bits, the chances of a collision would be very low. But that does not appear to be what you are testing. It looks as though you created 6 million random strings and four of them turned out to be identical. Therefore also had the same SHA1 digest. What happens if you try it again, making sure that all of your strings are unique. Alternatively, what do you get if you add some code to show the pairs of strings. Are they different? -- Stuart On 19 Mar 2011 at 16:03, Shamil Salakhetdinov wrote: > Hi Stuart and John, > > <<< > AFAIK, no one other than you has found any > >>> > Yes, I was also confused how it comes to get SHA1 hashes collisions > but then I have thought that John probably "compacts" SHA1 hashes into > 8 bytes surrogate keys? > > Here (P.S.) is a "quick&dirty" "brute force" test code - 4 collisions > for 6 millions cycles for 8 bytes keys. > > Please correct me if you find some errors in my sample code > > Thank you. > > -- > Shamil > > P.S. Code: > > public class GenerateHashes > { > private static Dictionary<string, string> _generatedHashes; > private static Dictionary<ulong, ulong> _generatedLongHashes; > > public static void Run(int minHashedStringLength, int > maxHashedStringLength, long testCasesQty) > { > System.Console.WriteLine("Started at: " + > System.DateTime.Now); > > _generatedHashes = new Dictionary<string, string>(); > _generatedLongHashes = new Dictionary<ulong, ulong>(); > > int collisionsQty = 0; > int longHashCollisionsQty = 0; > > decimal totalLengthOfProcessedStrings = 0; > decimal averageStringLength = 0; > > > Random random = new > Random((int)System.DateTime.Now.Ticks); for (long i = 1; i > <= testCasesQty; i++) { > int stringLength = random.Next(minHashedStringLength, > maxHashedStringLength); > byte[] plainTextBytes = > (byte[])Array.CreateInstance(typeof(byte), stringLength); > random.NextBytes(plainTextBytes); > System.Security.Cryptography.HashAlgorithm hash = new > SHA1Managed(); > byte[] hashBytes = hash.ComputeHash(plainTextBytes); > string hashValue = Convert.ToBase64String(hashBytes); > string testValue; if > (_generatedHashes.TryGetValue(hashValue, out > testValue)) > collisionsQty++; > else _generatedHashes.Add(hashValue, hashValue); > if (i % 1000000 == 0) > System.Console.WriteLine( > "{0}: i={1}, cs={2}, cu={3}", > DateTime.Now, i, collisionsQty, > longHashCollisionsQty); > > ulong longHash = 0xFFFFFFFF; > foreach (byte b in hashValue) > { > longHash ^= b; > longHash = longHash << 8 + 0xFF; > } > ulong longHashTestValue; > if (_generatedLongHashes.TryGetValue(longHash, out > longHashTestValue)) > { > //Long Hash Collision > longHashCollisionsQty++; > System.Console.WriteLine(" * step {0}, hash '{1}', > longHash: {2:X}", > i, hashValue, longHash); > } > else _generatedLongHashes.Add(longHash, longHash); > > totalLengthOfProcessedStrings += stringLength; > averageStringLength = totalLengthOfProcessedStrings / > i; > } > > System.Console.WriteLine( > "\nTOTALS:\n" + > "Min String Length = {0}\n" + > "Max String Length = {1}\n" + > "Avg String Length = {2}\n" + > "Test cases Qty = {3}\n" + > "Collisions Qty = {4}\n" + > "ulong Collisions Qty = {5}", > minHashedStringLength, maxHashedStringLength, > averageStringLength, testCasesQty, > collisionsQty, > longHashCollisionsQty); > > System.Console.WriteLine("Finished at: " + > System.DateTime.Now); > > } > } > > > P.P.S. Stats > > Started at: 19/03/2011 15:51:04 > 19/03/2011 15:51:16: i=1000000, cs=0, cu=0 > * step 1712197, hash 'Rt4NY5FHg0uMDszVql+l6ksx6Tg=', longHash: > 36D7CFC36A99DE80 > 19/03/2011 15:51:29: i=2000000, cs=0, cu=1 > * step 2040324, hash 'laoPHdqlfeZOx9DI8RDaQZjhKZs=', longHash: > D1B5AB44BB5CDE80 > * step 2043449, hash '1m+tgKu7/l9fl8nUNuz5I/OWF5E=', longHash: > C95F3EBC66B15E80 > 19/03/2011 15:51:44: i=3000000, cs=0, cu=3 > 19/03/2011 15:52:00: i=4000000, cs=0, cu=3 > * step 4369446, hash '2sZrglXDp3NnNPuqdILqwmUKq30=', longHash: > F7DB565F166C1E80 > 19/03/2011 15:52:17: i=5000000, cs=0, cu=4 > > TOTALS: > Min String Length = 7 > Max String Length = 25 > Avg String Length = 15.4981354 > Test cases Qty = 5,000,000 > Collisions Qty = 0 > ulong Collisions Qty = 4 > Finished at: 19/03/2011 15:52:17 > > -----Original Message----- > From: dba-vb-bounces at databaseadvisors.com > [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Stuart > McLachlan Sent: 19 ????? 2011 ?. 13:20 To: Discussion concerning > Visual Basic and related programming issues. Subject: Re: [dba-VB] > SHA1 to compute a hash > > How are you creating your hash? > > Can you post a few examples of different data strings and colliding > SHA1 hashes. I can probably make a lot of money out of them. > AFAIK, no one other than you has found any. > > -- > Stuart > > <<< snip >>> > > > _______________________________________________ > dba-VB mailing list > dba-VB at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-vb > http://www.databaseadvisors.com > >