BA: Reusable Performance Models for Distributed Dataflows
This topic area includes a larger set of different thesis topics in the realm of data-parallel processing. In particular, we are interested in conceptualizing and developing methods that not only model the performance of distributed dataflow jobs, but are also reusable due to their generalized and context-aware design. Topics can be:
- Bachelor/Master: Approaches to measure the similarity of dataflow jobs (e.g. using Graph Neural Networks)
- Bachelor/Master: Efficient profiling of dataflow jobs for optimal resource allocation
- Bachelor/Master: Context-aware prediction models for robust performance forecasts
- Bachelor/Master: Intelligent scheduling of distributed dataflows (e.g. in a shared cluster)
All topics evaluate their approaches with distributed dataflow frameworks and example dataflow jobs.
Prerequisites for working on this topic are advanced knowledge in Docker and Kubernetes, and excellent programming skills in at least one programming language like Java, Scala, or Python. It is advantageous to have solid skills in machine learning, or a keen interest in applying machine learning algorithms (e.g. reinforcement learning, deep learning) in the domain of distributed dataflows.
Students interested in topics not mentioned above but which relate to scheduling, profiling, resource allocation, or other aspects of distributed dataflows are welcome to send an email.
Thesis can be written in either German or English language.