MCQs
This is powerful because you can do things in your postings format that require making more than one pass through the postings such as iterating over all postings.
TCompactProtocol is typically more efficient to process as well.
The tools that the collocation identification algorithm are embedded within either consume tokenized text as input or provide the ability to specify an implementation of the Lucene Analyzer class perform tokenization in order to form ngrams.
The Spark engine runs in a variety of environments, from cloud services to Hadoop or Mesos clusters.
Spark Streaming ingests data in mini-batches and performs RDD transformations on those mini-batches of data.
Crunch is often used in conjunction with Hive and Pig.
Distributed Mode is used when you have multiple machines.
Drill conforms to the stringent ANSI SQL standards ensuring compatibility with existing BI environments as well as Hive deployments.
A number of predefined source adapters are built into Flume.
There are also a number of helper methods for working with the objects associated with the TaskInputOutputContext