[AccessD] Is hadoop for you? Or me? Fascinating but do I need to spend any more time on this?

jwcolby jwcolby at colbyconsulting.com
Tue Sep 20 21:20:43 CDT 2011


http://radar.oreilly.com/2011/01/what-is-hadoop.html

Mike Olson: The Hadoop platform was designed to solve problems where you have a lot of data — 
perhaps a mixture of complex and structured data — *and it doesn't fit nicely into tables.*

Mike Olson: Hadoop is designed to run on a large number of machines that don't share any memory or 
disks.

<snip>

Architecturally, the reason you're able to deal with lots of data is because Hadoop spreads it out. 
And the reason you're able to ask complicated computational questions is because you've got all of 
these processors, working in parallel, harnessed together.

Mike Olson: It's fair to say that a current Hadoop adopter must be more sophisticated than a 
relational database adopter.

Mike Olson: I'm a deep believer in relational databases and in SQL. I think the language is awesome 
and the products are incredible.

I hate the term "NoSQL." It was invented to create cachet around a bunch of different projects, each 
of which has different properties and behaves in different ways. *The real question is, what 
problems are you solving? That's what matters to users.*

http://freedb2.com/2010/02/04/is-hadoop-cloud-computing/

If you are not familiar with Hadoop, the best way to understand what it does is to think of it as a 
method or *a programming model for executing complex compute jobs on very large clusters of 
computers*. These clusters can comprise hundreds and, sometimes, thousands of machines. What Hadoop 
does is break, or Map, these complex jobs in to much more manageable tasks that are distributed to 
run on the machines in the cluster. It then assembles the results of the execution of these much 
smaller parts of the overall job in to one coherent answer. This process of collecting and 
consolidating the results of the execution is called “Reduce”.

http://en.wikipedia.org/wiki/Apache_Hadoop

Apache Hadoop is a software framework that supports data-intensive distributed applications under a 
free license.[1] *It enables applications to work with thousands of nodes and petabytes of data*. 
Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

-- 
John W. Colby
www.ColbyConsulting.com



More information about the AccessD mailing list