TU Berlin

Department of Telecommunication SystemsAdaptive Resource Management

Page Content

to Navigation

Adaptive Resource Management (ARM)

The ARM sub-group is lead by Dr. Lauritz Thamsen and works at the intersection of distributed systems, operating systems, and software engineering, focusing on adaptive resource management in context of critical and data-intensive distributed systems.

Research Theme

More and more important applications are relying increasingly on the processing of large volumes of data. These include, for instance, IoT applications for monitoring of traffic, transport systems, water networks, and other critical infrastructures within cities. Other applications monitor the vital parameters of remote patients in distributed telemedicine setups and current environment conditions such as seismic activities using large-scale distributed sensor networks. Moreover, businesses and the sciences have to deal with increasingly large amounts of real-time and historic data, be it to quickly detect fraudulent behavior in millions of payment transactions or comparing terabytes of human genomic data to accurately identify genetic disorders.

For many of these applications, there are clearly defined expectations for the required quality of service in terms of end-to-end latencies, throughput, scalability, availability, as well as the reliability of ingested and produced data. At the same time, distributed data processing applications are being deployed to more heterogeneous and dynamically changing environments.

As a result, running large-scale distributed applications is often a very difficult task, especially when given critical targets for performance and dependability. In fact, we argue that – while high-level programming abstractions and comprehensive processing frameworks have made it easier to develop data-intensive applications – operating critical data-intensive systems and applications has become more difficult over the last decade. And there is abundant evidence of low resource utilization, limited energy-efficiency, and severe failures with infrastructures, systems, and applications deployed in practice, backing up this claim.

Addressing these problems, we develop methods, systems, and tools to make the implementation, testing, and operation of efficient and dependable data-intensive applications easier. Towards this goal, we work on adaptive resource management and fault tolerance in distributed heterogeneous computing environments from small IoT devices to large-scale clusters of computers, aiming to create systems that automatically adapt to current workloads, dynamic distributed computing environments, and performance as well as dependability requirements of users and applications.

Topics of Interest

  • Resource management, scheduling, data and task placement, system configuration
  • Profiling, performance modeling, testing, simulations, testbeds
  • Distributed data-parallel processing, scalable batch and stream processing, distributed dataflow systems
  • Cluster infrastructures, heterogeneous hardware, real-time operating systems, networked embedded devices, sensor networks
  • Big data analytics, internet of things, urban infrastructure applications
  • Quality of service, efficiency, scalability, dependability, fault tolerance, usability

News

  • March 2020: We will have multiple open PhD positions in this sub-group over the summer of 2020. Take a look at our upcoming projects, keep an eye on the announcements, and get in touch!
  • December 2019: We are presenting new tools for efficiently testing task placements and system configurations when processing IoT sensor data streams at workshops at IEEE/ACM UCC 2019 ("Héctor") and IEEE Big Data 2019 ("Timon").

Team and Methodology

Our Team

Lupe

Left to right: Lauritz, Morgan, Felix, Ilja, Kordian, and Dominik

Our Research Methodology

We mostly do empirical systems research: we evaluate new ideas by implementing them prototypically in context of relevant open-source systems (such as Apache Flink, Hadoop YARN, Kubernetes, and FreeRTOS) and then conduct experiments on actual hardware, with exemplary applications, and real-world datasets. For this, we have access to state-of-the-art infrastructures, including a 200-nodes commodity cluster, our faculty's HPC cluster, private clouds, as well as IoT devices and sensors. That is, as far as possible, we empirically evaluate new ideas in their actual environments, making use of emulations and simulations only to be able to investigate more and more large-scale scenarios than physically feasible for us.

At the same time, we also work on practical applications to experience relevant problems ourselves and, thereby, uncover opportunities for well motivated and impactful research.

Projects

We have upcoming, active, and completed research projects in this area of research.

Upcoming


FONDA

The Collaborative Research Center Foundations of Workflows for Large-Scale Scientific Data Analysis (FONDA) is a new project funded by DFG that will investigate methods to support scientists, who work with cluster infrastructures to analyze very large datasets. Today, large-scale scientific data analysis is complicated by the necessity to select among different available computational resources and hand-tune distributed processing jobs. These settings are not straightforward and often platform-specific, yet have a significant impact on runtimes and efficiency and lead to either platform lock-in or performance losses. In FONDA, we are going to develop new methods for profiling, performance modeling, and task placement that will enable resource management systems to use the available cluster resources efficiently and, therefore, allow scientists to focus on the domain-specific challenges in their work. Learn more...

ide3a

The International Alliance for Digital E-learning, E-mobility and E-research in Academia (ide3a) is a new project funded by DAAD that will conduct teaching and research on the digitalization of critical infrastructures like water networks, energy grids, sensor networks, and other interconnected urban systems. In the project we will support new courses with new e-learning methods and usable simulation tools. The TU Berlin is at the center of this effort but the project network also includes five major European partners: the Norwegian University of Science and Technology, the Dublin City University, the Cracow University of Technology, Politecnico di Milano, and the Hasso-Plattner-Institute, supporting the mobility of students of these partners across Europe. Learn more...

Ongoing


BIFOLD

The BMBF-funded research center BIFOLD conducts research on the management and processing of large, distributed data as well as on machine learning. Within the project, we will develop and evaluate new systems and tools for the adaptive usage of heterogeneous, distributed resources for the efficient processing of data streams. Learn more...

BBDC

The BMBF-funded research project BBDC performs fundamental research, trains the data scientists of tomorrow, and develops practical solutions for complex analysis of large amounts of data. Learn more...

WaterGridSense 4.0

The BMBF-funded WaterGridSense project is a collaborative effort between water utilities, industry experts, and research institutions, aiming to research and develop methods for providing an online view into the current state of the water networks using distributed sensors. Learn more...

OPTIMA

The EU-funded research project OPTIMA is a project between TU Berlin, Fraunhofer FOKUS, and Ingenieurgesellschaft Prof. Dr. Sieker that aims to develop a predictive control system for water/wastewater networks that anticipates heavy load events together with the utilities in Berlin. Learn more...

Telemed5000

In the BMWi-funded research project Telemed5000, we are supporting the Hasso-Plattner-Institute and Berlin's Charité in developing an intelligent system for telemedical care of several thousand cardiological risk patients. Learn more...

Collaboration with Bundesdruckerei GmbH

Together with the Bundesdruckerei GmbH we look at trust in critical applications in context of the IoT and critical urban infrastructures, in particular at new methods for secret-free authentication, for instance relying instead on hardware fingerprints of IoT sensors.

Completed


Stratosphere

The DFG-funded research unit "Stratosphere - Information Management on the Cloud" investigated how to run massively parallel data processing jobs efficiently in Infrastructure-as-a-service clouds. Learn more...

Navigation

Quick Access

Schnellnavigation zur Seite über Nummerneingabe