[dba-SQLServer] PK cast fails

John W. Colby jwcolby at gmail.com
Fri Aug 7 22:32:54 CDT 2015


For the first time I ended up with a runtime error trying to insert data 
into a table.  I had an autoincrement PK which was an int type and I 
stored so many records that the int couldn't hold the value.

And so we go to a bigint.

It will be interesting to see how this works.

There is this thing called a "simmons protocol", which is some kind of 
demographic analysis used in the industry.  My client needs counts from 
our 225 million record database of how many records are XYZ 
demographic.  I did this "table driven" with a set of stored procedures 
which calculate about 220 counts from the table. Properly indexed the 
entire thing takes many hours, and a full day if tied to emails.

the table that I built to drive the counts looks like the following, 
which are just a few of the 220+ counts I have to do.

PK    SimmonsDefinition    Code    DB101Field    DB101FieldName 
HowUsed1    SQL1    HowUsed2    SQL2
1    32 MARRIED     M    32    Marital_Status    NULL    NULL NULL    NULL
2    32 SINGLE     S    32    Marital_Status    NULL    NULL NULL    NULL
3    33 MOVED INTO PRESENT RESIDENCE IN LAST 12 MOS    NULL    33 
Length_Of_Residence    C    = '0'    NULL    NULL
4    39 AGE 18-19    NULL    39    Date_Of_Birth_Year    F (DATEPART(yy, 
GETDATE()) - Date_Of_Birth_Year >= 18 AND DATEPART(yy, GETDATE()) - 
Date_Of_Birth_Year <= 19)    NULL NULL
5    39 AGE 20-21    NULL    39    Date_Of_Birth_Year    F (DATEPART(yy, 
GETDATE()) - Date_Of_Birth_Year >= 20 AND DATEPART(yy, GETDATE()) - 
Date_Of_Birth_Year <= 21)    NULL NULL
6    39 AGE 22 - 24    NULL    39    Date_Of_Birth_Year    F 
(DATEPART(yy, GETDATE()) - Date_Of_Birth_Year >= 22 AND DATEPART(yy, 
GETDATE()) - Date_Of_Birth_Year <= 24)    NULL NULL
18    43 SENIOR ADULTS IN HH (55+)    Y    43 
Senior_Adult_In_Household    NULL    NULL    NULL    NULL
19    44 EMPTY NESTER    Y    44    Empty_Nester    NULL    NULL NULL    
NULL
20    45 SINGLE PARENT    1    45    Single_Parent    NULL NULL    
NULL    NULL

I woke up this morning and realized that I needed to select each PKID 
from DB101 which matched each Simmons selection, and store those with 
the Simmons PKID.  Do this one time and then do my counts off of the 
stored results instead of recounting each time.  The issue I am having 
is that a) the SQL for doing the selection (or count) is dynamically 
created, but even worse, the WHERE field is pulled from usually one of 
hundreds of possible fields in DB101. And it is literally impossible to 
do indexes on that many fields, and especially across 225 million rows.

So I am building a table that looks like:

PK (bigint)
PKSimmons (int)
PKDB101 (int)

Select ALL of the DB101 PKs that match each simmons criteria one time 
and store in a table as above.

Select once and store, index the two fields, and then use a join on this 
table to do further processing, including counts in the future.

It appears as I will have many billions of records in this table.  I do 
know that after about 75 selections, I hit the limit of the Int data 
type to hold my PK (~ 2 billion records), which I had therefore had to 
change to a BigInt.

It will be interesting to see how long it takes to build an index on the 
PKSimmons and PKDB101 fields for future selects.  And of course, how 
fast PKDB101 values can be pulled out when selected by the PKSimmons 
field given an index on those two fields - across an index over 8 or 10 
billion records.

I will report back some results once the table insert(s) finish.

-- 
John W. Colby



More information about the dba-SQLServer mailing list