[dba-VB] Address Processing

jwcolby jwcolby at colbyconsulting.com
Sun Jul 22 11:07:17 CDT 2007


This business is about bulk mail addresses.  My client buys lists of
addresses from other list providers.  Each list is basically a "poll" that
someone answered or something similar.  A million people filled out warranty
cards when they bought a car, or filled out an online form when they visited
a web site, or...

Somehow they got their name on a list, but more importantly they gave other
important information about themselves such as brands of tobacco used, soft
drinks consumed, electronics purchased etc.  It is really THAT information
that is important about the person, because my client sells these names to
advertisers targeting people who have dogs or cats (a dog / cat food
mailing) or bought a certain brand of car 4 years ago (a dealership
mailing).

So... I get lists of names / addresses, with "demographics" information
included.  I pull the lists in to SQL Server.  Why?  Because my client then
comes back to me and says "give me 100K names in these zips who purchased
laundry detergent".  So I have to have the info in SQL Server in order to do
the where clauses on zips and demographics fields.

As soon as I import the table, I immediately export the data back out for
"address cleaning".  These lists average about 20% "trash", addresses where
the address was misspelled, no zip was given, or a completely fake address
was supplied etc.  So I export my own PKID, the name and address out to CSV
files, up to 1 million records at a time.  I send those addresses through a
program that verifies the addresses.  The first process is called CASS,
which simply verifies that the address "matches" an address template.  A
second process (run at the same time, subsequent to CASS is called DPV which
stands for delivery point validation.  IOW, yea, it matches a template, but
can it actually be delivered.  The USPS provides databases of every valid
address in the United States, and the addresses are run through this
database to see if it is a "real address" (DPV) as opposed just "looks like
a real address" (CASS).  And finally, the address is then processed for NCOA
or National Change of Address.  That is a database (USPS) that says "yea,
that person USED TO LIVE THERE but moved".

So it is obvious why I do the processing.  However I also need to re-process
the data periodically in order to catch the moves.  If John Colby moves from
Connecticut to North Carolina, I want to discover that and stop sending mail
to him in Connecticut and start sending it to him in NC.  So every so often
(monthly?) I will export the name / address back out and run it through the
process again.  CASS and DPV (theoretically) will get a 100% hit rate now
because I already did that last time, but NCOA will catch a certain % of the
people who moved.

I use a dedicated program for doing this.  There are too many variables for
me to even consider "reinventing the wheel".  These professional programs
perform such functions as splitting names and addresses, standardizing
abbreviations such as ST or ST. for Street etc.  Those standardizing
processes are required to get a hit on an address in the USPS database.  If
you are doing a few hundred names for your church or for some charity
organization, you can visually scan the "fallout" data to see if you can
pick up errors, but when processing 90 million names I never even look at
the data, I just can't do that.  So this professionally designed program
handles a bunch of stuff in order to prepare the data for validation,
performs the validation, and hands me back my PKID, name and address fields,
plus about 40 other fields that tell me things about the data itself
(political district, area code, etc) as well as data about why the data
failed to validate in case I want to do processing on the rejects (I don't).

In any event, this is not just a one shot deal.  I pull the RAW data in, add
a PK, Export the name / address, process that, import the resulting cleaned
address data back into a DIFFERENT table, but related to the RAW data by my
PK that I assigned to the raw data.  Periodically I re-export / import for
NCOA processing.

Now when the client asks for 1 million names of people who own dogs I go to
the table they tell me to, join the raw data to the address data, apply
where clauses (zip to the cleaned address, demographics to the raw data),
and pull the names and addresses out and export them for shipping to the
purchaser of the names.  We can tell the client that the data was last
cleaned "X weeks ago" so that they know they are not spending money to send
mail to bad addresses.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Borge Hansen
Sent: Sunday, July 22, 2007 3:25 AM
To: dba-vb at databaseadvisors.com
Subject: [dba-VB] Address Processing

Hi John,
I've been following your threads on the subject of your SQL Server system on
and off, the latest being:
Subject: Re: [dba-VB] How I'm approaching the problem

Where you wrote: "...The fact that I export the name / address back out for
address processing immediately is irrelevant, I will do so every month for
the rest of eternity...."

I am curious as to what your address processing entails...if you have been
writing on this previously I missed it...

Does the address processing entail returning geocode on the address?

I am currently using Google's MAP API to process addresses and having them
returned with a geocode including degree of accurary (street, suburb or
zip).... I am using some javascript picked up somewhere on the internet....
The javascript uses a webpage textcontrol as the input for address
processing (I copy and paste about 1,000 address records from an access
table into the textcontrol), and outputs the data records returned from
Google as tab delimited lines of text to another text control on the same
webpage. So I can process about a 1,000 addresses at a time... but at
present I have to manually copy and paste the processed addresses from the
text control into a say Access table....

What I am after is code that allows me to use a table (Access or SQL) as
input and another table for inserting the processed addresses.... Is this
something you are doing?

Anyone else who can help out here?

Regards,
Borge Hansen

_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com




More information about the dba-VB mailing list