JWColby
jwcolby at colbyconsulting.com
Sat Nov 4 22:42:31 CST 2006
LOL, well I have nirvana. Well... sorta. I am in the process of CASS validating my 65 million record database using Accuzip software, a rather kludgy process which uses FoxPro to validate ~3 million record chunks. I have two instances running on two different files. As you guys probably know by now, my system is now running Windows 2003 SBS, a dual core AMD X2 at 2.8 GHz and 2 gbytes and a raid 6 array with two largish volumes. Running just one instance of Accuzip the system processes about 6 million records / hour. Starting the second instance, leaves the first still processing about 6 million records per hour but the second is processing about 2.4 million records per hour. I find the inequality odd. My experience in the past (using XP Pro) is that it slowed down the first instance and both instances would run at roughly the same speed. Whatever. So basically I get an additional 40+ % processing using the second instance of Accuzip. Boy would it be nice to have a 16x processor array right now, but this is usable. I have to process ~22 files, each file would take ~30 minutes if processed by itself. Once the processing is finished, I delete the undeliverables and pack the FoxPro database, then start the next file importing. Once all 22 files are processed, I then have to re-import them into SQL Server and process them further there. At that point after the import I will have deliverable addresses, I am deleting all undeliverables. Since the exported / reimported records have a PK that matches my original (big) table, I will then have the ability to create a master list of completely valid deliverable ADDRESSES (not necessarily people at those addresses). It appears so far the non-deliverables are running around 14% which means I will end up with 86% of 64 million addresses in the deliverable master address table. This is only the address validation side of the picture. The fun begins when I normalize the big table into a bunch of smaller tables as discussed in an earlier email. I have actually done that for one specific job (a boating database) in order to do matching on a small (50K) database of names of yacht owners that they got from a client. The biggest problem is that in order to match names (or even addresses) you have to compare apples to apples and neither database was in CASSed form, and neither database provided "match codes" do do the comparison on. Much work quickly to make this happen. John W. Colby Colby Consulting www.ColbyConsulting.com