jwcolby
jwcolby at colbyconsulting.com
Sat Mar 1 08:05:00 CST 2008
Before I launch into that you should understand that missing values are OK. Of the original 64 million records, roughly 13 million are missing, thus there are missing PKs throughout the data, but they are just "one here and there", in fact on average one every 8 records. It is only "large holes" (I understand that is a rather nebulous term) that I care about. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of A.D.Tejpal Sent: Saturday, March 01, 2008 8:26 AM To: Access Developers discussion and problem solving Subject: Re: [AccessD] Cnts in sets John, You might like to have a look at my sample db named TrackMissingAndDuplicates and see whether it could help. It is available at Rogers Access Library (other developers library). Link - http://www.rogersaccesslibrary.com/OtherLibraries.asp#Tejpal,A.D. Gist of holes (missing sequential values) is displayed, giving the start and end values of each hole. In the situation explained by you, before testing for holes, the source data can first be filtered so as to eliminate Null values. Note - This sample is for Access 2000 / 2003 desktop and the execution speed has not been tested in the context of huge tables of the size mentioned by you. However, if the underlying principle is found applicable, suitable adaptation could be considered, say by testing smaller chunks at a time in such a manner that there are always some common records between adjacent chunks (i.e the last record of a given sub-table is also the first record of next sub-table). Best wishes, A.D.Tejpal ------------ ----- Original Message ----- From: jwcolby To: 'Access Developers discussion and problem solving' ; 'Discussion concerning MS SQL Server' Sent: Friday, February 29, 2008 02:08 Subject: [AccessD] Cnts in sets I have a table with 50 million records, which is a subset of a bigger table of 65 million records. IOW, of 65 million records, only about 50 million were selected for inclusion in the smaller table. The reason for this is that of the 65 million records, ~15 million had addresses that were undeliverable. I need a way to "view" or understand the distribution of PKIDs across that table. Back in the beginning I performed a very manual labor intensive task of processing these records through accuzip (address validation) where I had to break the 65 million records down into CSV files of ~1.5 million records, export them out to Accuzip, then reimport the validated results back into SQL Server, filter out bad records etc. I have discovered a "hole" in my results where there are no PKIDs across a range of hundreds of thousands of records. I do not know how big the "hole" is. Under normal circumstances (a visual view for example) you will see a few missing PKIDs in a range of 20 or 30, but never consecutive sets of missing records. But I have at least one "hole" where hundreds of thousands are missing. I need a way to find the "edges" of the hole of the missing PKIDs. IE I need to view the distribution of numbers across the range of numbers. It should be a "flat line" if you will where every once in awhile a number is missing, but where the hole is will be a drop to zero for an extended range. Or something. I have no idea how to "see" this data. I have a process that is exporting the validated records back out to be revalidated (people move). It is this export program (which I wrote) that discovered the hole, and I can if necessary use this program to find the hole but I thought some of you folks might know of a way to do this in SQL. Something like specify a range of numbers, a from/to and get a count, rinse / repeat. I did this manually to verify that a 1 million record range came up about 400 K short of the normal - ~800K is a "normal" count within a given million records, the "hole" only had ~400K records. BTW, I found the hole because I normally export "100K" records at a time into a file , and was filling a recordset with a WHERE PKID >=X and <=X+100K (which of course does not provide 100K but around 80K) and was looking for an empty recordset to tell me that I was done. Well... when I hit the "hole" I got the empty recordset and my process was saying it was done, except it had only exported about twenty 1 million record files out of a total 50 million records. oooops. So, I have a hole. I need to know if there are any other holes (smaller than my 100k export piece) but I really need a generic way to sense such a hole in my PKs. Any ideas? John W. Colby Colby Consulting www.ColbyConsulting.com -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com