jwcolby
jwcolby at colbyconsulting.com
Thu Nov 5 20:16:51 CST 2009
For ages I have been using a SHA1 hash of my name / address fields to create what I call hashfields. I feed Addr + zip5 + zip4 into SHA1 and store the result into HashAddr - varbool(200). Add last name to that string and feed to SHA1 to create HashFamily. And finally add first name to that string and feed to Sha1 to create HashPerson. I have been pulling my hair out trying to discover why a simple join on those fields would yield a different number of records than a join on my hash fields. IOW join Addr, Zip5, Zip4 and you pull X records but join HashAddr and you get Y records. It turns out that a join on the fields themselves ignore text case. SHA1 does not. Thus to a multifield join Twin Pines Drive and twin pines drive are identical but SHA1 returns a different hash string. Sigh! At least the mystery is solved. -- John W. Colby www.ColbyConsulting.com