Gustav Brock
Gustav at cactus.dk
Tue Mar 8 05:42:02 CST 2011
Hi Arthur This is what you brought up 4½ years ago: --- >>> artful at rogers.com 2006-11-22 13:14:50 >>> There is a whole other subject on this, about which I have written, but I googled it and it didn't come up, so perhaps I wrote it and forgot to sell it to somebody. The gist is this: it's called PITA, which doesn't mean pain in the arse, but rather Point In Time Architecture. Without PITA, the central problem with relational databases is that they don't provide an instant "roll back to August 1" capability. With PITA, they do. It's not all that complicated, but it does require a detailed walk-through so you can understand all the implications, the most critical of which is, "Nothing is ever updated. An updated row is actually replaced, and the updated row's EndDate column is updated to reflect the datetime on which the row was "changed". Thus it becomes possible to issue a query that reflects the state of the database on August 1, 2005. Obviously this increases the size of the db significantly, but in certain environments (such as medical), this is critical -- who was JWC's physician on that date, and what tests were performed, and by which medicos, and so on. So. Today's job is to dig out that PITA article and pitch it to somebody. --- Somehow you must have succeeded because your writing can found here, dated 2007-02-22: http://www.simple-talk.com/sql/database-administration/database-design-a-point-in-time-architecture/ As I wrote back also on 2006-11-22, what you describe is a temporal database or - more precise - a bitemporal: <quote> In the literature, two time lines of interest have been mentioned, transaction time and valid time. The valid time line represents when a fact is valid in modelled world (i.e. when it was believed) and the transaction time line represents when a transaction was performed. A bitemporal database is a combination of valid time and transaction time databases where these two time lines are considered to be orthogonal. (Snodgrass & Ahn 1986) </quote> This is a fascinating area, and your article describes nicely how - using SQL Server - to deal with some of the issues for a practical implementation of this theory (from 1986). However, I miss the connection to your eggs. To me, the collection of eggs describes rather a batch: A given population of 200 hens produce each day a collected batch of eggs which perhaps are stamped with a producer id and batch id but at least packed with other eggs from the same batch only. The batch id is written on the package. This way a bad egg at the consumer can be tracked back to the package, the producer, the date, the packing machine, the population of hens, and - perhaps - the possible bags of corn (or whatever) used to feed these hens. You will record all associated data in a write-once/read-many database, but as you by definition never will change or correct these data, I see no scenario for a temporal or PITA database, it's more like a log file. The only date field needed here is the packing date. And how about the autonumber and the index maintenance Jim brought up? /gustav >>> fuller.artful at gmail.com 08-03-2011 01:00 >>> Let us distinguish two problems: The first is the "egg" problem. I have 200 chickens each of which lays several eggs per day, each of which is plonked into an arbitrary-numbered case. At some point, it may be interesting to know which chicken laid which eggs and into which cartons they were placed. Most often, this level of detail is not interesting, but occasionally it is vital and potentially life-saving. The second is the "serial number" problem. Every crankshaft or windshield or manifold coming off an assembly line has a unique serial number, unlike the aforementioned eggs. Each one of these parts can be traced to a shift and a line and perhaps ultimately to a worker. Big difference in these problems, and big difference in which attributes we choose to model. IME, I have dealt more with the egg problem than the serial number problem, but in recent years this has changed. To further complicate things, this latter problem has been compounded by the PITA issue (Point in Time Archictecture; for details on this problem and its solution see my piece at Red Gate's site). Arthur On Mon, Mar 7, 2011 at 11:58 AM, Gustav Brock <Gustav at cactus.dk> wrote: > Hi Jim > > I'm not so sure about that. As far as I know, maintenance cost of a an > autonumber index is close to zero for adding records, zero for updates of > other fields, and tables that large typically are for appending/reading > only. > > But, of course, scenarios exist where you have to optimise where possible. > > /gustav