[dba-SQLServer] Cnts in sets

Thu Feb 28 14:38:55 CST 2008

I have a table with 50 million records, which is a subset of a bigger table
of 65 million records.  IOW, of 65 million records, only about 50 million
were selected for inclusion in the smaller table.  The reason for this is
that of the 65 million records, ~15 million had addresses that were
undeliverable.  I need a way to "view" or understand the distribution of
PKIDs across that table.  Back in the beginning I performed a very manual
labor intensive task of processing these records through accuzip (address
validation) where I had to break the 65 million records down into CSV files
of ~1.5 million records, export them out to Accuzip, then reimport the
validated results back into SQL Server, filter out bad records etc.

I have discovered a "hole" in my results where there are no PKIDs across a
range of hundreds of thousands of records.  I do not know how big the "hole"
is.  Under normal circumstances (a visual view for example) you will see a
few missing PKIDs in a range of 20 or 30, but never consecutive sets of
missing records.  But I have at least one "hole" where hundreds of thousands
are missing.  I need a way to find the "edges" of the hole of the missing
PKIDs.  IE I need to view the distribution of numbers across the range of
numbers.  It should be a "flat line" if you will where every once in awhile
a number is missing, but where the hole is will be a drop to zero for an
extended range.  Or something.  

I have no idea how to "see" this data.  

I have a process that is exporting the validated records back out to be
revalidated (people move).  It is this export program (which I wrote) that
discovered the hole, and I can if necessary use this program to find the
hole but I thought some of you folks might know of a way to do this in SQL.
Something like specify a range of numbers, a from/to and get a count, rinse
/ repeat.  I did this manually to verify that a 1 million record range came
up about 400 K short of the normal - ~800K is a "normal" count within a
given million records, the "hole" only had ~400K records.

BTW, I found the hole because I normally export "100K" records at a time
into a file , and was filling a recordset with a WHERE PKID >=X and <=X+100K
(which of course does not provide 100K but around 80K) and was looking for
an empty recordset to tell me that I was done.  Well... when I hit the
"hole" I got the empty recordset and my process was saying it was done,
except it had only exported about twenty 1 million record files out of a
total 50 million records.  oooops.

So, I have a hole.  I need to know if there are any other holes (smaller
than my 100k export piece) but I really need a generic way to sense such a
hole in my PKs.

Any ideas?

John W. Colby
Colby Consulting
www.ColbyConsulting.com