Except for unions, the JSON encoding is the same as is used to encode field default values.
Avro resolves possible conflicts through the name of the field.
Avro binary-encoded data can be efficiently ordered without deserializing it to objects.
Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.
Of all the available compression codecs in Hadoop, Bzip2 is by far the slowest.
You can use the gunzip command to decompress files that were created by a number of compression utilities, including Gzip.
AvroMapper is used to provide the ability to collect or map data.
SequenceFile key-value list can be just a Text/Text pair, and is written to the file during the initialization that happens in the SequenceFile.
One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time.
If the NameNode machine fails, manual intervention is necessary.