[AccessD] Cnts in sets

A.D.Tejpal adtp at airtelmail.in
Sat Mar 1 07:25:53 CST 2008


John,

    You might like to have a look at my sample db named TrackMissingAndDuplicates and see whether it could help. It is available at Rogers Access Library (other developers library). Link - http://www.rogersaccesslibrary.com/OtherLibraries.asp#Tejpal,A.D.
 
    Gist of holes (missing sequential values) is displayed, giving the start and end values of each hole. In the situation explained by you, before testing for holes, the source data can first be filtered so as to eliminate Null values.

    Note - This sample is for Access 2000 / 2003 desktop and the execution speed has not been tested in the context of huge tables of the size mentioned by you. However, if the underlying principle is found applicable, suitable adaptation could be considered, say by testing smaller chunks at a time in such a manner that there are always some common records between adjacent chunks (i.e the last record of a given sub-table is also the first record of next sub-table).
 
Best wishes,
A.D.Tejpal
------------

  ----- Original Message ----- 
  From: jwcolby 
  To: 'Access Developers discussion and problem solving' ; 'Discussion concerning MS SQL Server' 
  Sent: Friday, February 29, 2008 02:08
  Subject: [AccessD] Cnts in sets


  I have a table with 50 million records, which is a subset of a bigger table
  of 65 million records.  IOW, of 65 million records, only about 50 million
  were selected for inclusion in the smaller table.  The reason for this is
  that of the 65 million records, ~15 million had addresses that were
  undeliverable.  I need a way to "view" or understand the distribution of
  PKIDs across that table.  Back in the beginning I performed a very manual
  labor intensive task of processing these records through accuzip (address
  validation) where I had to break the 65 million records down into CSV files
  of ~1.5 million records, export them out to Accuzip, then reimport the
  validated results back into SQL Server, filter out bad records etc.

  I have discovered a "hole" in my results where there are no PKIDs across a
  range of hundreds of thousands of records.  I do not know how big the "hole"
  is.  Under normal circumstances (a visual view for example) you will see a
  few missing PKIDs in a range of 20 or 30, but never consecutive sets of
  missing records.  But I have at least one "hole" where hundreds of thousands
  are missing.  I need a way to find the "edges" of the hole of the missing
  PKIDs.  IE I need to view the distribution of numbers across the range of
  numbers.  It should be a "flat line" if you will where every once in awhile
  a number is missing, but where the hole is will be a drop to zero for an
  extended range.  Or something.  

  I have no idea how to "see" this data.  

  I have a process that is exporting the validated records back out to be
  revalidated (people move).  It is this export program (which I wrote) that
  discovered the hole, and I can if necessary use this program to find the
  hole but I thought some of you folks might know of a way to do this in SQL.
  Something like specify a range of numbers, a from/to and get a count, rinse
  / repeat.  I did this manually to verify that a 1 million record range came
  up about 400 K short of the normal - ~800K is a "normal" count within a
  given million records, the "hole" only had ~400K records.

  BTW, I found the hole because I normally export "100K" records at a time
  into a file , and was filling a recordset with a WHERE PKID >=X and <=X+100K
  (which of course does not provide 100K but around 80K) and was looking for
  an empty recordset to tell me that I was done.  Well... when I hit the
  "hole" I got the empty recordset and my process was saying it was done,
  except it had only exported about twenty 1 million record files out of a
  total 50 million records.  oooops.

  So, I have a hole.  I need to know if there are any other holes (smaller
  than my 100k export piece) but I really need a generic way to sense such a
  hole in my PKs.

  Any ideas?

  John W. Colby
  Colby Consulting
  www.ColbyConsulting.com 


More information about the AccessD mailing list