[dba-SQLServer] How would you cross index

Fri Jun 4 16:18:12 CDT 2010

Imagine you had 10 tables of names / addresses.  Each table came from a different source.  You could 
get more such tables at a moment's notice.  Each table has a different number of records, real life 
examples - 12 million, 23 million, 65 million, 75 million etc.

Each table has a autonumber PKID that (typically) starts at zero and works up, incrementing by one.

Each table has three hash fields (SHA as it happens).
1) Addr / zip5 / zip4
2) LName / Addr / Zip5 / Zip4
3) FName / LName / Zip5 / Zip4

You wanted to cross index these tables.  In other words you wanted to know how many exact matches 
you had (FName, LName, Addr, City, St, Zip5, Zip4) between any given tables.

Further imagine that each table had additional DIFFERENT fields that gave specific information about 
the names in that table.  One might be "owns a dog / cat", another might be "has kids in age 
brackets..."

What method would you use to allow a cross index?

The only method that I am coming up with would be to have a name/Address table with additional 
fields to store the PKID from each table.  Add each unique name / address into the table one time. 
Use the name hash in inner joins to discover exact matches.  Update the PKID fields in each column 
for the table(s) that the match(es) occur.  Add a new column every time you get a new table.

Your ideas?

-- 
John W. Colby
www.ColbyConsulting.com