MCQs
The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted.
Archives options is also a generic option.
The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.
The primary key is used for partitioning, and the combination of the primary and secondary keys is used for sorting.
Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based).
Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.
enterprise data protection and security options including file system auditing and data-at-rest encryption to address compliance requirements is also provided by Isilon solution.
HBase is the Hadoop database: a distributed, scalable Big Data store that lets you host very large tables ” billions of rows multiplied by millions of columns ” on clusters built with commodity hardware.
HDFS and NoSQL file systems focus almost exclusively on adding nodes to increase performance (scale-out) but even they require node configuration with elements of scale up.
The right number of reduces seems to be 0.95 or 1.75.