[dba-Tech] High-Volume R/O database

Fri Jul 25 09:06:42 CDT 2014

Suppose I'm interested in creating and maintaining a high-volume read-only
database into which several million rows will be inserted 5 days per week,
spread over 24 hours a day. Basically all the rows will go into a single
table; every now and then a row will go into a related table, but that is
unlikely to occur even once a day, more like once a month. The data of
maximal interest is today's data, but sometimes I will need to look at
yesterday's, and on occasion further back (this week, this month, this
year, etc.); these interrogations are increasingly rare in proportion with
the timespan from today back into time.

I'm thinking about how best to design this. One thought is to flush the
table once per day at midnight, moving today's rows into a permanent
history table, truncating the table and starting fresh each morning. The
History table would be designed on an OLAP model. On those relatively rare
occasions when I need to compare today with the week or month or year, I
can join today to the OLAP table(s) to get my temporal dimensions as
quickly as possible.

Can you spot any holes in this thought? Or think of a better way to handle
this enormous input?

I have no immediate customer for this. Truth is, it's motivated out of pure
interest. I've just finished reading Michael Lewis's book *Flash Boys*,
which is a terrific read and highly recommended. Basically, it's about how
the stock market is rigged in favour of HFTs (high frequency traders) and
against the average stockholder. Back to the problem: each row inserted
would be an order to buy or sell a number of shares in a given stock.
Comparisons would almost always be on similar rows (same stock), and most
often attempt to match buys with corresponding sells. That's
oversimplifying, but the gist is there.

-- 
Arthur