Page Content
Master Thesis Kaikai Yang
Scalable Stream Processing
with PACTs
A growing number of interet-capable devices can produce vast
amounts of data in the form of streams, e.g. video and audio. This
data poses new challenges for all parts of the underlying
infrastructure, especially when it needs to processed under real-time
constraints.
Frameworks for massively-parallel data
processing in large compute clusters have become popular in both
industry and acedemia. These frameworks, such as Hadoop, usually
process data in a batch-job fashion, an approach which unfortunately
does not address the requirements of streamed data, where results are
often required within a very short timespan, e.g. less than a second.
The scope of this thesis is the design and implementation of a
programming model for parallel, distributed stream processing based on
Stratosphere's PACTs model. The main task is the definition of window
semantics for the existing PACTs operators and their adaptation.
Prerequisite to work on this topic are profound knowledge of
the Java programming language, interest in current research topics, as
well as the willingness to familiarize with an existing
system.