jwcolby
jwcolby at colbyconsulting.com
Thu Feb 28 14:38:55 CST 2008
I have a table with 50 million records, which is a subset of a bigger table of 65 million records. IOW, of 65 million records, only about 50 million were selected for inclusion in the smaller table. The reason for this is that of the 65 million records, ~15 million had addresses that were undeliverable. I need a way to "view" or understand the distribution of PKIDs across that table. Back in the beginning I performed a very manual labor intensive task of processing these records through accuzip (address validation) where I had to break the 65 million records down into CSV files of ~1.5 million records, export them out to Accuzip, then reimport the validated results back into SQL Server, filter out bad records etc. I have discovered a "hole" in my results where there are no PKIDs across a range of hundreds of thousands of records. I do not know how big the "hole" is. Under normal circumstances (a visual view for example) you will see a few missing PKIDs in a range of 20 or 30, but never consecutive sets of missing records. But I have at least one "hole" where hundreds of thousands are missing. I need a way to find the "edges" of the hole of the missing PKIDs. IE I need to view the distribution of numbers across the range of numbers. It should be a "flat line" if you will where every once in awhile a number is missing, but where the hole is will be a drop to zero for an extended range. Or something. I have no idea how to "see" this data. I have a process that is exporting the validated records back out to be revalidated (people move). It is this export program (which I wrote) that discovered the hole, and I can if necessary use this program to find the hole but I thought some of you folks might know of a way to do this in SQL. Something like specify a range of numbers, a from/to and get a count, rinse / repeat. I did this manually to verify that a 1 million record range came up about 400 K short of the normal - ~800K is a "normal" count within a given million records, the "hole" only had ~400K records. BTW, I found the hole because I normally export "100K" records at a time into a file , and was filling a recordset with a WHERE PKID >=X and <=X+100K (which of course does not provide 100K but around 80K) and was looking for an empty recordset to tell me that I was done. Well... when I hit the "hole" I got the empty recordset and my process was saying it was done, except it had only exported about twenty 1 million record files out of a total 50 million records. oooops. So, I have a hole. I need to know if there are any other holes (smaller than my 100k export piece) but I really need a generic way to sense such a hole in my PKs. Any ideas? John W. Colby Colby Consulting www.ColbyConsulting.com