Cloud9 is a
collection of Hadoop tools that tries to make working with big data a
bit easier.
This software was designed with two goals in mind: First, to serve
as a teaching tool for MapReduce and MapReduce algorithm
design. Second, to provide a collection of useful tools on which to
build other "big data" systems. Here are just a few features:
- API for working with various text collections, including
Wikipedia, TREC document collections for information retrieval
research, and
the ClueWeb09
web crawl.
- Reference implementations of a few common MapReduce algorithm,
including PageRank, bread-first search, co-occurrence matrix
computation.
- Implementations of various useful Hadoop data types.
- Efficient primitive implementation of Java maps, along with
integration
with fastutil.