[AccessD] 2.45 billion rows and counting was PK Cast fails

John W. Colby jwcolby at gmail.com
Sat Aug 8 12:39:03 CDT 2015


Apparently not.  Ordered asc it took about 10 seconds to return the 
first row.  Ordered desc it timed out (as a view).  In a query it is off 
hunting.

The issue of course is that it is adding a few thousand records per 
second and there are (currently) 3 billion records in the table.  So I 
assume that it might eventually figure out top(1), but that is changing 
every time the log file pushes to the table.

I am letting it run.

John W. Colby

On 8/8/2015 11:09 AM, Bill Benson wrote:
> Can you run a select query ordered by pk desc, and take f
> Top 1 of that?
> On Aug 8, 2015 11:01 AM, "John W. Colby" <jwcolby at gmail.com> wrote:
>
>> The Simmons table I am building now has 2.5 billion rows.  The stored
>> procedure is running a "loop" iterating through a table of 228 Simmons
>> demographics definitions, pulling PKs from DB101 (225 million possible PKs)
>> one at a time, and inserting them into the junction table mentioned in the
>> previous email.  Sometime last night the log file ran out of room (filled
>> up the log disk) and stopped the process, at Simmons Def 70.  Luckily I am
>> writing the individual sql statements out as each one is executed so I
>> could see where it stopped.  I cleaned a bunch of very large log files off
>> and restarted this morning.
>>
>> While I can see the total storage (records) in the junction table, what I
>> can't do is get a Max() of the Simmons PK in the junction table to see
>> where I am in the process.  I was hoping that info would be in metadata
>> somewhere but apparently not, it seems that it has to scan through the
>> table looking for the max() value.  While I haven't let it run in a query
>> window, trying to do that in a view timed out for obvious reasons.
>>
>> And so I wait for completion.  I would estimate the junction table will
>> contain somewhere north of 10 billion records when done.
>>
>> I have never even come close to this quantity of records in a single table
>> so I am still unsure whether this will even make things faster.  Assuming
>> an index on two fields:
>>
>> PKSimmons - PKs for each of 220 Simmons Definitions
>> PKDB101 - DB101 PKs that match the Simmons Definition
>>
>> Will doing a join on PKDB101 for a single value of PKSimmons return a
>> recordset any time in the useful future?  I have to join the PKDB101 to
>> another table (and possibly two), then do WHERE stuff on that other
>> table(s), then count the remaining records to determine which DB101 records
>> match the Simmons definition.
>>
>> This essentially pre-selects just exactly the DB101 records that match the
>> simmons definition, storing the selected DB101 PKs in the junction table.
>>
>> The way I do it right now is just dynamically build a SQL statement,
>> joining everything required, doing a pretty complicated where and counting
>> the resulting records.  Rinse and repeat 220 times, once for each Simmons
>> definition.
>>
>> The other question is as follows.  I use a BigInt as the PK and make it a
>> defined PK, i.e. the table ends up clustered on that PK. Should I have
>> included the PDSimmons and PKDB101 in that PK so that those two fields are
>> already indexed using the PK index?  As it is now, I will be creating a
>> separate (non clustered) index on just those two fields when done.  Given
>> that the records are being inserted in the table in PKSimmons order, i.e.
>> an entire set of PK101 values inserted each iteration of my code, one set
>> for each PKSimmons value.  So had I used at least the PKSimmons in the
>> (clustered) PK would have it made any difference in performance?  Or when
>> doing a join, will the server scan the clustered index anyway and just find
>> all the groups by virtue of the fact that the PK is autonumber and all
>> records for each PKSimmons value are contiguous?
>>
>> The things that keep me up at night.
>>
>> --
>> John W. Colby
>>
>> --
>> AccessD mailing list
>> AccessD at databaseadvisors.com
>> http://databaseadvisors.com/mailman/listinfo/accessd
>> Website: http://www.databaseadvisors.com
>>



More information about the AccessD mailing list