[AccessD] 2.45 billion rows and counting was PK Cast fails

Bill Benson bensonforums at gmail.com
Sat Aug 8 10:09:13 CDT 2015


Can you run a select query ordered by pk desc, and take f
Top 1 of that?
On Aug 8, 2015 11:01 AM, "John W. Colby" <jwcolby at gmail.com> wrote:

> The Simmons table I am building now has 2.5 billion rows.  The stored
> procedure is running a "loop" iterating through a table of 228 Simmons
> demographics definitions, pulling PKs from DB101 (225 million possible PKs)
> one at a time, and inserting them into the junction table mentioned in the
> previous email.  Sometime last night the log file ran out of room (filled
> up the log disk) and stopped the process, at Simmons Def 70.  Luckily I am
> writing the individual sql statements out as each one is executed so I
> could see where it stopped.  I cleaned a bunch of very large log files off
> and restarted this morning.
>
> While I can see the total storage (records) in the junction table, what I
> can't do is get a Max() of the Simmons PK in the junction table to see
> where I am in the process.  I was hoping that info would be in metadata
> somewhere but apparently not, it seems that it has to scan through the
> table looking for the max() value.  While I haven't let it run in a query
> window, trying to do that in a view timed out for obvious reasons.
>
> And so I wait for completion.  I would estimate the junction table will
> contain somewhere north of 10 billion records when done.
>
> I have never even come close to this quantity of records in a single table
> so I am still unsure whether this will even make things faster.  Assuming
> an index on two fields:
>
> PKSimmons - PKs for each of 220 Simmons Definitions
> PKDB101 - DB101 PKs that match the Simmons Definition
>
> Will doing a join on PKDB101 for a single value of PKSimmons return a
> recordset any time in the useful future?  I have to join the PKDB101 to
> another table (and possibly two), then do WHERE stuff on that other
> table(s), then count the remaining records to determine which DB101 records
> match the Simmons definition.
>
> This essentially pre-selects just exactly the DB101 records that match the
> simmons definition, storing the selected DB101 PKs in the junction table.
>
> The way I do it right now is just dynamically build a SQL statement,
> joining everything required, doing a pretty complicated where and counting
> the resulting records.  Rinse and repeat 220 times, once for each Simmons
> definition.
>
> The other question is as follows.  I use a BigInt as the PK and make it a
> defined PK, i.e. the table ends up clustered on that PK. Should I have
> included the PDSimmons and PKDB101 in that PK so that those two fields are
> already indexed using the PK index?  As it is now, I will be creating a
> separate (non clustered) index on just those two fields when done.  Given
> that the records are being inserted in the table in PKSimmons order, i.e.
> an entire set of PK101 values inserted each iteration of my code, one set
> for each PKSimmons value.  So had I used at least the PKSimmons in the
> (clustered) PK would have it made any difference in performance?  Or when
> doing a join, will the server scan the clustered index anyway and just find
> all the groups by virtue of the fact that the PK is autonumber and all
> records for each PKSimmons value are contiguous?
>
> The things that keep me up at night.
>
> --
> John W. Colby
>
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
>


More information about the AccessD mailing list