jwcolby
jwcolby at colbyconsulting.com
Sun Jul 22 11:07:17 CDT 2007
This business is about bulk mail addresses. My client buys lists of addresses from other list providers. Each list is basically a "poll" that someone answered or something similar. A million people filled out warranty cards when they bought a car, or filled out an online form when they visited a web site, or... Somehow they got their name on a list, but more importantly they gave other important information about themselves such as brands of tobacco used, soft drinks consumed, electronics purchased etc. It is really THAT information that is important about the person, because my client sells these names to advertisers targeting people who have dogs or cats (a dog / cat food mailing) or bought a certain brand of car 4 years ago (a dealership mailing). So... I get lists of names / addresses, with "demographics" information included. I pull the lists in to SQL Server. Why? Because my client then comes back to me and says "give me 100K names in these zips who purchased laundry detergent". So I have to have the info in SQL Server in order to do the where clauses on zips and demographics fields. As soon as I import the table, I immediately export the data back out for "address cleaning". These lists average about 20% "trash", addresses where the address was misspelled, no zip was given, or a completely fake address was supplied etc. So I export my own PKID, the name and address out to CSV files, up to 1 million records at a time. I send those addresses through a program that verifies the addresses. The first process is called CASS, which simply verifies that the address "matches" an address template. A second process (run at the same time, subsequent to CASS is called DPV which stands for delivery point validation. IOW, yea, it matches a template, but can it actually be delivered. The USPS provides databases of every valid address in the United States, and the addresses are run through this database to see if it is a "real address" (DPV) as opposed just "looks like a real address" (CASS). And finally, the address is then processed for NCOA or National Change of Address. That is a database (USPS) that says "yea, that person USED TO LIVE THERE but moved". So it is obvious why I do the processing. However I also need to re-process the data periodically in order to catch the moves. If John Colby moves from Connecticut to North Carolina, I want to discover that and stop sending mail to him in Connecticut and start sending it to him in NC. So every so often (monthly?) I will export the name / address back out and run it through the process again. CASS and DPV (theoretically) will get a 100% hit rate now because I already did that last time, but NCOA will catch a certain % of the people who moved. I use a dedicated program for doing this. There are too many variables for me to even consider "reinventing the wheel". These professional programs perform such functions as splitting names and addresses, standardizing abbreviations such as ST or ST. for Street etc. Those standardizing processes are required to get a hit on an address in the USPS database. If you are doing a few hundred names for your church or for some charity organization, you can visually scan the "fallout" data to see if you can pick up errors, but when processing 90 million names I never even look at the data, I just can't do that. So this professionally designed program handles a bunch of stuff in order to prepare the data for validation, performs the validation, and hands me back my PKID, name and address fields, plus about 40 other fields that tell me things about the data itself (political district, area code, etc) as well as data about why the data failed to validate in case I want to do processing on the rejects (I don't). In any event, this is not just a one shot deal. I pull the RAW data in, add a PK, Export the name / address, process that, import the resulting cleaned address data back into a DIFFERENT table, but related to the RAW data by my PK that I assigned to the raw data. Periodically I re-export / import for NCOA processing. Now when the client asks for 1 million names of people who own dogs I go to the table they tell me to, join the raw data to the address data, apply where clauses (zip to the cleaned address, demographics to the raw data), and pull the names and addresses out and export them for shipping to the purchaser of the names. We can tell the client that the data was last cleaned "X weeks ago" so that they know they are not spending money to send mail to bad addresses. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-vb-bounces at databaseadvisors.com [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Borge Hansen Sent: Sunday, July 22, 2007 3:25 AM To: dba-vb at databaseadvisors.com Subject: [dba-VB] Address Processing Hi John, I've been following your threads on the subject of your SQL Server system on and off, the latest being: Subject: Re: [dba-VB] How I'm approaching the problem Where you wrote: "...The fact that I export the name / address back out for address processing immediately is irrelevant, I will do so every month for the rest of eternity...." I am curious as to what your address processing entails...if you have been writing on this previously I missed it... Does the address processing entail returning geocode on the address? I am currently using Google's MAP API to process addresses and having them returned with a geocode including degree of accurary (street, suburb or zip).... I am using some javascript picked up somewhere on the internet.... The javascript uses a webpage textcontrol as the input for address processing (I copy and paste about 1,000 address records from an access table into the textcontrol), and outputs the data records returned from Google as tab delimited lines of text to another text control on the same webpage. So I can process about a 1,000 addresses at a time... but at present I have to manually copy and paste the processed addresses from the text control into a say Access table.... What I am after is code that allows me to use a table (Access or SQL) as input and another table for inserting the processed addresses.... Is this something you are doing? Anyone else who can help out here? Regards, Borge Hansen _______________________________________________ dba-VB mailing list dba-VB at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-vb http://www.databaseadvisors.com