JMoss
jmoss111 at bellsouth.net
Fri Aug 13 02:35:32 CDT 2004
John, I use to work for a database marketing company and they used purchased lists only when necessary to add to a customers small list, and usually only from reputable list sellers like Victorias Secret, specialty magazines, mail order firms, etc. There are quite a few disreputable list vendors whose products were raw, messy and without any hygiene performed like cleanup, names and addresses in several different formats, no state or zip info, fields transposed, commas in strings, deduping, householding, or NCOA. These lists normally contained 20 - 30 % dups, especially when considering householding. You might want to look at DoubleTake, Personator, RightrFielder, Styleist, and Dirty Harry's Character Assasin from PeopleSmith Tools. By the way, these tools aren't cheap, and neither is having an NCOA done on the files. If your marketer is going to sell phone lists, you are going to have to have a federal DNC an a state DNC list for each state that you sell in, and your list has to be updated quarterly or run the risk of stiff penalties. As far as I know, there is no direct export capability from SQL Server to a compressed file, unless you export onto a compressed volume or decompress a zip onto a compressed volume. We used DTS and/or DataJunction (now Pervasive Cosmos) to get the files into SQL 2000. You can look at the file using PFE which should handle most any size file that you throw at it. And you had better hope the first line contains a header, mapping 500 columns without any data definition could put you in a belltower with an M16. Problems with data could cost a lot of time. Dates if incorrect (like 10/21/948) can kill an insert at 95% complete; other gremlins will raise their ugly little heads. For this, forget Access, use SQL Server Enterprise Manager/Query Analyzer. We built a factor table that contained a factor key and a customer key to pull our lists. A factor is the basic info used to pull the list. We would select customerkey from customerfactor where factordesc = 'Owns bugZapper' and customerkey in (select customerkey from customeraddress where state = 'TN') We charged $125 an hour for setup, factoring, producing data for a mailing or marketing campaign, or segmenting data. Most of the time, there was no hard product generated, the client had the ability to use the data via a browser based interface. Jim -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com]On Behalf Of John W. Colby Sent: Thursday, August 12, 2004 10:11 PM To: dba-sqlserver at databaseadvisors.com Subject: [dba-SQLServer] Going over to the dark side I have been contacted by what appears to be a startup marketing firm, who want to take a 64 million name database and pull it into SQL Server. They are getting the data in comma delimited format, apparently compressed, on two DVDs - totaling something like 60 gbytes of raw text data. They have never seen the data (just purchased it) but they think it is about 400-500 fields of some personal info but mostly demographic stuff. Things like "owns a boat, owns a motor home, has credit cards, ethnic, sex, income" etc. Their intention is to pull subsets of the data and sell it to other companies. This is all very vague since all I know so far is what I have gleaned from one of them in a handful of phone conversations. They haven't seen the data, don't know how to get it our of the DVDs etc. I have no experience with trying to get something like that into SQL Server. I have a couple of questions. First, if the data is comma delimited, my assumption is that the first line will be field names, followed by data. Is SQL Server (or some other big db) capable of exporting directly into a cab file or zip file? If this is two DVDs both of which are compressed comma delimited files, how do I uncompress the data before importing it into SQL Server? They think it is 60gb. I have room for a 60gb database but not the uncompressed data as well as the database. I can of course just go buy a huge hard disk (200 gb) but I just spent a lot of time getting a mirror up and especially for something of this nature I would want to get it on a mirrored drive. Plus they want to start looking at the data as soon as possible. Second, is this a job for bcp? How do I get it in? Writing a VB function to parse it doesn't seem reasonable. Third, how long is it going to take to get that much data into a SQL Server table. It apparently is a single flat file which should translate to a single table of 400-500 fields. I assume that something like bcp would handle building the table given the field name in the first line, comma delimited? If not, how do I look at that first line so that I can go build the table "manually" in order to do the import? Fourth, what tool would I use to query that? Access has query limits that appear to eliminate it as a tool for this, never mind the speed issues. On the other hand if the actual number of fields exported out are small (the data exported doesn't have to contain the demographics data, just the personal data) then perhaps an ADP would allow a form with controls to select the demographics, then SQL Server directly dumps the data. Fifth, it seems logical that I would want to index the demographics fields (several hundred fields) so that queries could efficiently pull subsets of the data. How "big" is the database going to get with indexes on 350 (or more) fields? IOW, the raw data is 65gb - turned into a properly indexed table, what does that translate into in terms of SQL Database file size? Sixth, given 350 demographic fields, wouldn't I need to pull subsets to maintain lists of all the valid values for a given field. IOW, if there is a field for ethnicity, with 15 choices, it seems I would need to run a "distinct" query against that field and build a table of all the possible values rather than run a "distinct" on the fly to populate the combo that allows choices. Now multiply that by 350 fields. That looks like a lot of preprocessing just to get ready to start hashing the data. Seventh, How much processor power / memory is needed to handle a database of this nature? Is this something that I could reasonably expect to buy / build? These guys are examining their options, but three options they mentioned are: 1) Just hire me (or someone) to "take an order and produce a file, on a CD or tape". IOW, I own the server, I hold the db, and I take an order and fed ex out a CD. 2) Buy a server, stick it in their office and hire me to set it up and maintain it. 3) Hire a company out there on the internet somewhere to put the data on a server at a server farm. Then query that, somehow get the data onto a cd or a tape. Much bigger question there since I would no longer just figure it out and do it all myself. Assuming a reasonable fee per data pull my preferences would be ordered 1, 2, and 3. Option 1 sets up a constant income stream but with the issue of having to be available to build the CDs. Option 3 is just too much setup, at least initially. Is anyone out there doing something of this nature? Any words of wisdom? John W. Colby www.ColbyConsulting.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com