[AccessD] Cnts in sets

Sat Mar 1 08:05:00 CST 2008

Before I launch into that you should understand that missing values are OK.
Of the original 64 million records, roughly 13 million are missing, thus
there are missing PKs throughout the data, but they are just "one here and
there", in fact on average one every 8 records.  It is only "large holes" (I
understand that is a rather nebulous term) that I care about.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of A.D.Tejpal
Sent: Saturday, March 01, 2008 8:26 AM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] Cnts in sets

John,

    You might like to have a look at my sample db named
TrackMissingAndDuplicates and see whether it could help. It is available at
Rogers Access Library (other developers library). Link -
http://www.rogersaccesslibrary.com/OtherLibraries.asp#Tejpal,A.D.

    Gist of holes (missing sequential values) is displayed, giving the start
and end values of each hole. In the situation explained by you, before
testing for holes, the source data can first be filtered so as to eliminate
Null values.

    Note - This sample is for Access 2000 / 2003 desktop and the execution
speed has not been tested in the context of huge tables of the size
mentioned by you. However, if the underlying principle is found applicable,
suitable adaptation could be considered, say by testing smaller chunks at a
time in such a manner that there are always some common records between
adjacent chunks (i.e the last record of a given sub-table is also the first
record of next sub-table).

Best wishes,
A.D.Tejpal
------------

  ----- Original Message -----
  From: jwcolby
  To: 'Access Developers discussion and problem solving' ; 'Discussion
concerning MS SQL Server' 
  Sent: Friday, February 29, 2008 02:08
  Subject: [AccessD] Cnts in sets

  I have a table with 50 million records, which is a subset of a bigger
table
  of 65 million records.  IOW, of 65 million records, only about 50 million
  were selected for inclusion in the smaller table.  The reason for this is
  that of the 65 million records, ~15 million had addresses that were
  undeliverable.  I need a way to "view" or understand the distribution of
  PKIDs across that table.  Back in the beginning I performed a very manual
  labor intensive task of processing these records through accuzip (address
  validation) where I had to break the 65 million records down into CSV
files
  of ~1.5 million records, export them out to Accuzip, then reimport the
  validated results back into SQL Server, filter out bad records etc.

  I have discovered a "hole" in my results where there are no PKIDs across a
  range of hundreds of thousands of records.  I do not know how big the
"hole"
  is.  Under normal circumstances (a visual view for example) you will see a
  few missing PKIDs in a range of 20 or 30, but never consecutive sets of
  missing records.  But I have at least one "hole" where hundreds of
thousands
  are missing.  I need a way to find the "edges" of the hole of the missing
  PKIDs.  IE I need to view the distribution of numbers across the range of
  numbers.  It should be a "flat line" if you will where every once in
awhile
  a number is missing, but where the hole is will be a drop to zero for an
  extended range.  Or something.  

  I have no idea how to "see" this data.  

  I have a process that is exporting the validated records back out to be
  revalidated (people move).  It is this export program (which I wrote) that
  discovered the hole, and I can if necessary use this program to find the
  hole but I thought some of you folks might know of a way to do this in
SQL.
  Something like specify a range of numbers, a from/to and get a count,
rinse
  / repeat.  I did this manually to verify that a 1 million record range
came
  up about 400 K short of the normal - ~800K is a "normal" count within a
  given million records, the "hole" only had ~400K records.

  BTW, I found the hole because I normally export "100K" records at a time
  into a file , and was filling a recordset with a WHERE PKID >=X and
<=X+100K
  (which of course does not provide 100K but around 80K) and was looking for
  an empty recordset to tell me that I was done.  Well... when I hit the
  "hole" I got the empty recordset and my process was saying it was done,
  except it had only exported about twenty 1 million record files out of a
  total 50 million records.  oooops.

  So, I have a hole.  I need to know if there are any other holes (smaller
  than my 100k export piece) but I really need a generic way to sense such a
  hole in my PKs.

  Any ideas?

  John W. Colby
  Colby Consulting
  www.ColbyConsulting.com
--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com