Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leading-edge machine learning algorithms.
The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell-scripts.
Hadoop has support for cross platform operating system.
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user.
With the default replication value, 3, data is stored on three nodes: two on the same rack, and one on a different rack.
The programming model, MapReduce, used by Hadoop is simple to write and test.
Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware.
Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant.
Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools.
Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions.