[dba-SQLServer] [AccessD] MySQL

Tue Sep 20 20:28:02 CDT 2011

Hadoop, of course, is the Apache Software Foundation project created several years ago by then-Yahoo 
employee Doug Cutting. It has become a critical tool for web companies — including Yahoo and 
Facebook — to process their ever-growing volumes of unstructured data, and is fast making its way 
into organizations of all types and sizes. Hadoop has spawned a number of commercial distributions 
and products, too, including from Cloudera, EMC  and IBM.

My data is not unstructured.

Do you actually read this stuff?  I did actually read what you sent round the last time and all they 
discussed was data captured from and displayed in web pages.  Not a single mention of parent child 
relations 12 levels deep, no mention at all of BofA tearing down their systems, flattening out their 
data and storing it on one of these things.  Lots of talk of Google (web data) face book (web data) 
yahoo (web data) etc ad nauseum.

The fact that this thing worked for some specific thing that you did doesn't make it fit everything 
out there.

 > Our provincial government has been working with Google to build a huge land database and the 
results are stellar. Milliseconds to pull all data on any encumbrances on a lot or parcels of lots. 
Before, running with the traditional SQL technologies it would take hours to get the same results.

IOW a relational database was a poor fit for this particular task.

OK.

 > A few years ago, I installed a blade, painted indigo and marked Google, at the legislator. The 
box was supposed to take all the comments from the sessions, translate them into text and then allow 
anyone to pull the comments back from any time within that session. Again, standard SQL had been 
tried and had failed...and again instantaneous gratification.

IOW a relational database was a poor fit for this particular task.

OK.

Are you generalizing from those examples that a relational database is a poor fit for any task?

Hmmmm.....

I don't work with Google.  I don't have a budget of millions.  I don't have a programming staff of 
hundreds or thousands.  I don't have server farms with a thousand nodes and a billion documents.  I 
am not a provincial government with a huge land database, nor am I a legislature with too many notes 
to keep track of.

So how exactly again does any of this fit what I do?  Sorry but ya lost me.

OTOH if you say you can reproduce my system for a couple of hundred of hours of work and it will be 
a million times faster on my same system I will pay you to do that.  Delivered results of course.

I kinda get that I should keep on with my development effort while I await your delivered system.  ;)

John W. Colby
www.ColbyConsulting.com

On 9/20/2011 8:23 PM, Jim Lawrence wrote:
> You are living out of stream John.
>
> I know of a number of people now who are working with such technology and
> very successfully, I might add, but the technology is not Main Street, as it
> is not advertised similar to Linux. It will take longer to be common
> knowledge, as there is no huge advertising machine behind Open Source
> products.
>
> Our provincial government has been working with Google to build a huge land
> database and the results are stellar. Milliseconds to pull all data on any
> encumbrances on a lot or parcels of lots. Before, running with the
> traditional SQL technologies it would take hours to get the same results.
> People working with the new system thought it was broken at first as the
> results were so fast...now they have become use to instantaneous
> gratification. The interesting thing is that the new system is using the old
> hardware as the project was supposed to be just a test...some test.
>
> A few years ago, I installed a blade, painted indigo and marked Google, at
> the legislator. The box was supposed to take all the comments from the
> sessions, translate them into text and then allow anyone to pull the
> comments back from any time within that session. Again, standard SQL had
> been tried and had failed...and again instantaneous gratification.
>
> There are many other instananeous of this type Reduces map technology is
> being used but it is only for situations where huge chunks of data need to
> pull results from very complex queries and quickly. It is also for someone
> with a limited budget as NOSQL databases do not need place holder fields for
> partially filled rows. This generally translates into a complex set of data
> filling less than half the space of a traditional SQL DB and therefore less
> hardware.
>
> There are even new hybred data solutions coming out where both Map Reduce
> and traditional SQL are being used to extract data.
>
> " ...LexisNexis is releasing a set of open-source, data-processing tools
> that it says outperforms Hadoop and even handles workloads Hadoop presently
> can't. The technology (and new business line) is called HPCC Systems... "
>
> http://gigaom.com/cloud/lexisnexis-open-sources-its-hadoop-killer/
>
> Jim