jwcolby
jwcolby at colbyconsulting.com
Tue Sep 20 20:28:02 CDT 2011
Hadoop, of course, is the Apache Software Foundation project created several years ago by then-Yahoo employee Doug Cutting. It has become a critical tool for web companies — including Yahoo and Facebook — to process their ever-growing volumes of unstructured data, and is fast making its way into organizations of all types and sizes. Hadoop has spawned a number of commercial distributions and products, too, including from Cloudera, EMC and IBM. My data is not unstructured. Do you actually read this stuff? I did actually read what you sent round the last time and all they discussed was data captured from and displayed in web pages. Not a single mention of parent child relations 12 levels deep, no mention at all of BofA tearing down their systems, flattening out their data and storing it on one of these things. Lots of talk of Google (web data) face book (web data) yahoo (web data) etc ad nauseum. The fact that this thing worked for some specific thing that you did doesn't make it fit everything out there. > Our provincial government has been working with Google to build a huge land database and the results are stellar. Milliseconds to pull all data on any encumbrances on a lot or parcels of lots. Before, running with the traditional SQL technologies it would take hours to get the same results. IOW a relational database was a poor fit for this particular task. OK. > A few years ago, I installed a blade, painted indigo and marked Google, at the legislator. The box was supposed to take all the comments from the sessions, translate them into text and then allow anyone to pull the comments back from any time within that session. Again, standard SQL had been tried and had failed...and again instantaneous gratification. IOW a relational database was a poor fit for this particular task. OK. Are you generalizing from those examples that a relational database is a poor fit for any task? Hmmmm..... I don't work with Google. I don't have a budget of millions. I don't have a programming staff of hundreds or thousands. I don't have server farms with a thousand nodes and a billion documents. I am not a provincial government with a huge land database, nor am I a legislature with too many notes to keep track of. So how exactly again does any of this fit what I do? Sorry but ya lost me. OTOH if you say you can reproduce my system for a couple of hundred of hours of work and it will be a million times faster on my same system I will pay you to do that. Delivered results of course. I kinda get that I should keep on with my development effort while I await your delivered system. ;) John W. Colby www.ColbyConsulting.com On 9/20/2011 8:23 PM, Jim Lawrence wrote: > You are living out of stream John. > > I know of a number of people now who are working with such technology and > very successfully, I might add, but the technology is not Main Street, as it > is not advertised similar to Linux. It will take longer to be common > knowledge, as there is no huge advertising machine behind Open Source > products. > > Our provincial government has been working with Google to build a huge land > database and the results are stellar. Milliseconds to pull all data on any > encumbrances on a lot or parcels of lots. Before, running with the > traditional SQL technologies it would take hours to get the same results. > People working with the new system thought it was broken at first as the > results were so fast...now they have become use to instantaneous > gratification. The interesting thing is that the new system is using the old > hardware as the project was supposed to be just a test...some test. > > A few years ago, I installed a blade, painted indigo and marked Google, at > the legislator. The box was supposed to take all the comments from the > sessions, translate them into text and then allow anyone to pull the > comments back from any time within that session. Again, standard SQL had > been tried and had failed...and again instantaneous gratification. > > There are many other instananeous of this type Reduces map technology is > being used but it is only for situations where huge chunks of data need to > pull results from very complex queries and quickly. It is also for someone > with a limited budget as NOSQL databases do not need place holder fields for > partially filled rows. This generally translates into a complex set of data > filling less than half the space of a traditional SQL DB and therefore less > hardware. > > There are even new hybred data solutions coming out where both Map Reduce > and traditional SQL are being used to extract data. > > " ...LexisNexis is releasing a set of open-source, data-processing tools > that it says outperforms Hadoop and even handles workloads Hadoop presently > can't. The technology (and new business line) is called HPCC Systems... " > > http://gigaom.com/cloud/lexisnexis-open-sources-its-hadoop-killer/ > > Jim