jwcolby
jwcolby at colbyconsulting.com
Fri Jun 4 16:18:12 CDT 2010
Imagine you had 10 tables of names / addresses. Each table came from a different source. You could get more such tables at a moment's notice. Each table has a different number of records, real life examples - 12 million, 23 million, 65 million, 75 million etc. Each table has a autonumber PKID that (typically) starts at zero and works up, incrementing by one. Each table has three hash fields (SHA as it happens). 1) Addr / zip5 / zip4 2) LName / Addr / Zip5 / Zip4 3) FName / LName / Zip5 / Zip4 You wanted to cross index these tables. In other words you wanted to know how many exact matches you had (FName, LName, Addr, City, St, Zip5, Zip4) between any given tables. Further imagine that each table had additional DIFFERENT fields that gave specific information about the names in that table. One might be "owns a dog / cat", another might be "has kids in age brackets..." What method would you use to allow a cross index? The only method that I am coming up with would be to have a name/Address table with additional fields to store the PKID from each table. Add each unique name / address into the table one time. Use the name hash in inner joins to discover exact matches. Update the PKID fields in each column for the table(s) that the match(es) occur. Add a new column every time you get a new table. Your ideas? -- John W. Colby www.ColbyConsulting.com