Arthur Fuller
fuller.artful at gmail.com
Fri Jul 25 09:06:42 CDT 2014
Suppose I'm interested in creating and maintaining a high-volume read-only database into which several million rows will be inserted 5 days per week, spread over 24 hours a day. Basically all the rows will go into a single table; every now and then a row will go into a related table, but that is unlikely to occur even once a day, more like once a month. The data of maximal interest is today's data, but sometimes I will need to look at yesterday's, and on occasion further back (this week, this month, this year, etc.); these interrogations are increasingly rare in proportion with the timespan from today back into time. I'm thinking about how best to design this. One thought is to flush the table once per day at midnight, moving today's rows into a permanent history table, truncating the table and starting fresh each morning. The History table would be designed on an OLAP model. On those relatively rare occasions when I need to compare today with the week or month or year, I can join today to the OLAP table(s) to get my temporal dimensions as quickly as possible. Can you spot any holes in this thought? Or think of a better way to handle this enormous input? I have no immediate customer for this. Truth is, it's motivated out of pure interest. I've just finished reading Michael Lewis's book *Flash Boys*, which is a terrific read and highly recommended. Basically, it's about how the stock market is rigged in favour of HFTs (high frequency traders) and against the average stockholder. Back to the problem: each row inserted would be an order to buy or sell a number of shares in a given stock. Comparisons would almost always be on similar rows (same stock), and most often attempt to match buys with corresponding sells. That's oversimplifying, but the gist is there. -- Arthur