[dba-SQLServer] Joins don't discriminate

jwcolby jwcolby at colbyconsulting.com
Thu Nov 5 20:16:51 CST 2009


For ages I have been using a SHA1 hash of my name / address fields to create what I call hashfields. 
  I feed Addr + zip5 + zip4 into SHA1 and store the result into HashAddr - varbool(200).  Add last 
name to that string and feed to SHA1 to create HashFamily.  And finally add first name to that 
string and feed to Sha1 to create HashPerson.

I have been pulling my hair out trying to discover why a simple join on those fields would yield a 
different number of records than a join on my hash fields.  IOW join Addr, Zip5, Zip4 and you pull X 
records but join HashAddr and you get Y records.

It turns out that a join on the fields themselves ignore text case.  SHA1 does not.

Thus to a multifield join Twin Pines Drive and twin pines drive are identical but SHA1 returns a 
different hash string.

Sigh!

At least the mystery is solved.

-- 
John W. Colby
www.ColbyConsulting.com



More information about the dba-SQLServer mailing list