direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

BA: Reusable Performance Models for Distributed Dataflows

This topic area includes a larger set of different thesis topics in the realm of data-parallel processing. In particular, we are interested in conceptualizing and developing methods that not only model the performance of distributed dataflow jobs, but are also reusable due to their generalized and context-aware design. Topics can be:

  • Bachelor/Master: Approaches to measure the similarity of dataflow jobs (e.g. using Graph Neural Networks)
  • Bachelor/Master: Efficient profiling of dataflow jobs for optimal resource allocation
  • Bachelor/Master: Context-aware prediction models for robust performance forecasts 
  • Bachelor/Master: Intelligent scheduling of distributed dataflows (e.g. in a shared cluster) 


All topics evaluate their approaches with distributed dataflow frameworks and example dataflow jobs.

Prerequisites for working on this topic are advanced knowledge in Docker and Kubernetes, and excellent programming skills in at least one programming language like Java, Scala, or Python. It is advantageous to have solid skills in machine learning, or a keen interest in applying machine learning algorithms (e.g. reinforcement learning, deep learning) in the domain of distributed dataflows.

Students interested in topics not mentioned above but which relate to scheduling, profiling, resource allocation, or other aspects of distributed dataflows are welcome to send an email.

Thesis can be written in either German or English language.

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe


Dominik Scheinert
+49 30 314-26260
Room TEL 1218