Jim Lawrence
accessd at shaw.ca
Sun Jan 26 23:03:08 CST 2014
Hi All: I must admit that there is a bit of a preamble to but it is all aimed at a point and, I believe, the future in data management. Back a number of years ago, when working for a government branch that handled data and policy I was asked to retrieve a full set of data summaries and have them ready to display and/or printout at the request of the department head. To say the least the data was a mess. It had evolved for years and each time the data model was improved the data structure was changed and because it was easier to just make a new table than try and figure out how to consolidate the information in one table. To add to data's complexity, government policy continued to change and affect how data entered into the existing table. When the variation became too extreme time for a new table. All this data resided, in a Oracle database, running on Unix, on one of the largest HP prolient computers I had ever seen. It was somewhere between standard server and a mini, full 64 bit, multi-core, a huge extended RAID with coprocessor running the stacks of drives. 64Bit and multi-core might not be unusual now but then it was really rare and only available to governments and big businesses. (A perfect piece of hardware network. ;-)) I decided to run the whole interface through a MS Access FE but gather and calculate the data processing through the big prolient and Oracle. It took almost two weeks to build the Oracle procedure query. I segregated each data table set and applied subqueries to manage various policy changes (you use to be able to only go six level deep with subqueries, on the Oracle of the day, but I went no more than four levels. After each table set had been processed and a common data set had been retrieved all the data sets were UNION ALL together. The finished query was about two hundred lines long. I had tested each module extensively but now it was time to test the fully integrated query. I pressed the button on my Access form and trotted off for a coffee as I knew it would probably take a few minutes even on that great machine. As I was walking to the coffee room, I could over hear everyone saying the system had stopped. I knew it was a momentary thing so continued on to fix a coffee. It was momentary; the rest of the morning, momentary...two hundred lines of code had that huge server thrashing and every resource it had was being used (the benefits of having full admin rights). After that, I broke that big query into individual modules and controlled its processing via the Access FE. First one module was called, then when the data was received, that data was sorted and added to the local MDB table...then in a few minutes, after the server's memory had cleared, the next module was called and process was repeated until all the modules had been handled, sorted and appended. This method took less than an hour and none of the other staff were inconvenienced. At that moment, I realized that a system that had multiple small computers probably could have handled the processing far more effectively. Vertical processing was maxed out (and was a dead-end technology) and horizontal processing was the future of data management. NoSQL databases are not the biggest part of this process, it is the data managers like Hadoop and the modern languages like Elan and F# and even torrent networks (and dozens similar technologies), that will make the new data revolution a reality. It is a very exciting time to be working in the computer industry. Jim