Inhalt des Dokuments
With Stratosphere, a Big Data analysis framework developed at TUB and founded by the DFG, the CIT group has gained a solid knowledge base in the field of Big Data. Stratosphere is a data parallel, general purpose framework which overcomes the restrictions the map reduce paradigm. It is capable of running batch, as well as real-time tasks like indexing, filtering, transforming, or aggregating data originally found in dataflow systems, but also supports iterations often found in machine learning and graph analysis. It supports structured as well as unstructured data and can run on clusters and in the cloud. To write data analysis programs, it offers a high level, declarative language, a workflow based language on operator basis and a low level language on the graph level.
With the follow-up project Stratosphere II new features like streaming, iterations and state handling will be researched.