[dba-SQLServer] PK cast fails
John W. Colby
jwcolby at gmail.com
Fri Aug 7 22:32:54 CDT 2015
For the first time I ended up with a runtime error trying to insert data
into a table. I had an autoincrement PK which was an int type and I
stored so many records that the int couldn't hold the value.
And so we go to a bigint.
It will be interesting to see how this works.
There is this thing called a "simmons protocol", which is some kind of
demographic analysis used in the industry. My client needs counts from
our 225 million record database of how many records are XYZ
demographic. I did this "table driven" with a set of stored procedures
which calculate about 220 counts from the table. Properly indexed the
entire thing takes many hours, and a full day if tied to emails.
the table that I built to drive the counts looks like the following,
which are just a few of the 220+ counts I have to do.
PK SimmonsDefinition Code DB101Field DB101FieldName
HowUsed1 SQL1 HowUsed2 SQL2
1 32 MARRIED M 32 Marital_Status NULL NULL NULL NULL
2 32 SINGLE S 32 Marital_Status NULL NULL NULL NULL
3 33 MOVED INTO PRESENT RESIDENCE IN LAST 12 MOS NULL 33
Length_Of_Residence C = '0' NULL NULL
4 39 AGE 18-19 NULL 39 Date_Of_Birth_Year F (DATEPART(yy,
GETDATE()) - Date_Of_Birth_Year >= 18 AND DATEPART(yy, GETDATE()) -
Date_Of_Birth_Year <= 19) NULL NULL
5 39 AGE 20-21 NULL 39 Date_Of_Birth_Year F (DATEPART(yy,
GETDATE()) - Date_Of_Birth_Year >= 20 AND DATEPART(yy, GETDATE()) -
Date_Of_Birth_Year <= 21) NULL NULL
6 39 AGE 22 - 24 NULL 39 Date_Of_Birth_Year F
(DATEPART(yy, GETDATE()) - Date_Of_Birth_Year >= 22 AND DATEPART(yy,
GETDATE()) - Date_Of_Birth_Year <= 24) NULL NULL
18 43 SENIOR ADULTS IN HH (55+) Y 43
Senior_Adult_In_Household NULL NULL NULL NULL
19 44 EMPTY NESTER Y 44 Empty_Nester NULL NULL NULL
NULL
20 45 SINGLE PARENT 1 45 Single_Parent NULL NULL
NULL NULL
I woke up this morning and realized that I needed to select each PKID
from DB101 which matched each Simmons selection, and store those with
the Simmons PKID. Do this one time and then do my counts off of the
stored results instead of recounting each time. The issue I am having
is that a) the SQL for doing the selection (or count) is dynamically
created, but even worse, the WHERE field is pulled from usually one of
hundreds of possible fields in DB101. And it is literally impossible to
do indexes on that many fields, and especially across 225 million rows.
So I am building a table that looks like:
PK (bigint)
PKSimmons (int)
PKDB101 (int)
Select ALL of the DB101 PKs that match each simmons criteria one time
and store in a table as above.
Select once and store, index the two fields, and then use a join on this
table to do further processing, including counts in the future.
It appears as I will have many billions of records in this table. I do
know that after about 75 selections, I hit the limit of the Int data
type to hold my PK (~ 2 billion records), which I had therefore had to
change to a BigInt.
It will be interesting to see how long it takes to build an index on the
PKSimmons and PKDB101 fields for future selects. And of course, how
fast PKDB101 values can be pulled out when selected by the PKSimmons
field given an index on those two fields - across an index over 8 or 10
billion records.
I will report back some results once the table insert(s) finish.
--
John W. Colby
More information about the dba-SQLServer
mailing list