jwcolby
jwcolby at colbyconsulting.com
Tue Mar 8 08:49:43 CST 2011
While this is a nice idea, and perhaps critical somewhere, I cannot see it being usable in most environments. I have a claim table with 50 or 100 fields and I am going to store all of that again because a single date in one field is changed? In an ideal world without real world constraints, go for it. However we are replacing natural keys with surrogates because of issues with storing just a handful of fields "over again" and the negative impacts on speed and storage requirements that natural keys bring. If I were going to do a temporal kind of db I would go with a "normalized" solution where I stored just the PKID of the record changing, name of the field (or ID of the field) the "old" field data, and the change time. Then you end up with a "change log" concept. All of which demands intense analysis to determine where the change log becomes less efficient than storing the whole record. One way or the other (in my experience), very few clients are willing to pay for this kind of thing. John W. Colby www.ColbyConsulting.com On 3/8/2011 6:42 AM, Gustav Brock wrote: > Hi Arthur > > This is what you brought up 4½ years ago: > > --- >>>> artful at rogers.com 2006-11-22 13:14:50>>> > > There is a whole other subject on this, about which I have written, but > I googled it and it didn't come up, so perhaps I wrote it and forgot to > sell it to somebody. The gist is this: it's called PITA, which doesn't > mean pain in the arse, but rather Point In Time Architecture. Without > PITA, the central problem with relational databases is that they don't > provide an instant "roll back to August 1" capability. With PITA, they > do. It's not all that complicated, but it does require a detailed > walk-through so you can understand all the implications, the most > critical of which is, "Nothing is ever updated. An updated row is > actually replaced, and the updated row's EndDate column is updated to > reflect the datetime on which the row was "changed". Thus it becomes > possible to issue a query that reflects the state of the database on > August 1, 2005. Obviously this increases the size of the db > significantly, but in certain environments (such as medical), this is > critical -- who was JWC's physician on that date, and what tests were > performed, and by which medicos, and so on. > > So. Today's job is to dig out that PITA article and pitch it to > somebody. > --- > > Somehow you must have succeeded because your writing can found here, > dated 2007-02-22: > > http://www.simple-talk.com/sql/database-administration/database-design-a-point-in-time-architecture/ > > As I wrote back also on 2006-11-22, what you describe is a temporal > database or - more precise - a bitemporal: > > <quote> > In the literature, two time lines of interest have been mentioned, > transaction time and valid time. The valid time line represents when a > fact is valid in modelled world (i.e. when it was believed) and the > transaction time line represents when a transaction was performed. A > bitemporal database is a combination of valid time and transaction time > databases where these two time lines are considered to be orthogonal. > (Snodgrass& Ahn 1986) > </quote> > > This is a fascinating area, and your article describes nicely how - > using SQL Server - to deal with some of the issues for a practical > implementation of this theory (from 1986). > > However, I miss the connection to your eggs. > > To me, the collection of eggs describes rather a batch: A given > population of 200 hens produce each day a collected batch of eggs which > perhaps are stamped with a producer id and batch id but at least packed > with other eggs from the same batch only. The batch id is written on the > package. > This way a bad egg at the consumer can be tracked back to the package, > the producer, the date, the packing machine, the population of hens, and > - perhaps - the possible bags of corn (or whatever) used to feed these > hens. > You will record all associated data in a write-once/read-many database, > but as you by definition never will change or correct these data, I see > no scenario for a temporal or PITA database, it's more like a log file. > The only date field needed here is the packing date. > > And how about the autonumber and the index maintenance Jim brought up? > > /gustav > > >>>> fuller.artful at gmail.com 08-03-2011 01:00>>> > Let us distinguish two problems: > > The first is the "egg" problem. I have 200 chickens each of which lays > several eggs per day, each of which is plonked into an > arbitrary-numbered > case. At some point, it may be interesting to know which chicken laid > which > eggs and into which cartons they were placed. Most often, this level > of > detail is not interesting, but occasionally it is vital and > potentially > life-saving. > > The second is the "serial number" problem. Every crankshaft or > windshield or > manifold coming off an assembly line has a unique serial number, unlike > the > aforementioned eggs. Each one of these parts can be traced to a shift > and a > line and perhaps ultimately to a worker. > > Big difference in these problems, and big difference in which > attributes we > choose to model. > > IME, I have dealt more with the egg problem than the serial number > problem, > but in recent years this has changed. To further complicate things, > this > latter problem has been compounded by the PITA issue (Point in Time > Archictecture; for details on this problem and its solution see my > piece at > Red Gate's site). > > Arthur > > On Mon, Mar 7, 2011 at 11:58 AM, Gustav Brock<Gustav at cactus.dk> > wrote: > >> Hi Jim >> >> I'm not so sure about that. As far as I know, maintenance cost of a > an >> autonumber index is close to zero for adding records, zero for > updates of >> other fields, and tables that large typically are for > appending/reading >> only. >> >> But, of course, scenarios exist where you have to optimise where > possible. >> >> /gustav