TU Berlin

Department of Telecommunication SystemsAdaptive Resource Management

Page Content

to Navigation

Adaptive Resource Management (ARM)

Lupe

The ARM subgroup, lead by Dr. Lauritz Thamsen, works at the intersection of distributed systems, operating systems, and software engineering, focusing on adaptive resource management in critical and data-intensive distributed systems.

News

  • December 2020: We are going to present a couple of recent works at IEEE Big Data 2020 in December! The pre-print PDFs are online already here: [1], [2], and [3].
  • November 2020: We are on twitter now: @ARM_TUBerlin. Follow us for the latest news and research results!

Team

Full-Time Researchers

Lupe

Research Statement

Research Theme

More and more important applications are relying increasingly on the processing of large volumes of data. These include, for instance, IoT applications for monitoring of traffic, transport systems, water networks, and other critical infrastructures within cities. Other applications monitor the vital parameters of remote patients in distributed telemedicine setups and current environment conditions such as seismic activities using large-scale distributed sensor networks. Moreover, businesses and the sciences have to deal with increasingly large amounts of real-time and historic data, be it to quickly detect fraudulent behavior in millions of payment transactions or comparing terabytes of human genomic data to accurately identify genetic disorders.

For many of these applications, there are clearly defined expectations for the required quality of service in terms of end-to-end latencies, throughput, scalability, availability, as well as the reliability of ingested and produced data. Another major concern is efficiency and especially so, when it comes to the consumption of energy generated from fossil fuels. At the same time, distributed data processing applications are being deployed to more heterogeneous and dynamic environments.

As a result, running large-scale distributed applications is often a very difficult task, especially when given critical targets for performance and dependability. In fact, we argue that – while high-level programming abstractions and comprehensive processing frameworks have made it easier to develop data-intensive applications – efficiently operating critical data-intensive systems and applications has become more difficult over the last decade. And there is abundant evidence of low resource utilization, limited energy-efficiency, and severe failures with infrastructures, systems, and applications deployed in practice that back up this claim.

Addressing these problems, we develop methods, systems, and tools to make the implementation, testing, and operation of efficient and dependable data-intensive distributed applications easier. Towards this goal, we work on adaptive resource management and fault tolerance in distributed heterogeneous computing environments from small IoT devices to large-scale clusters of virtual resources, aiming to create systems that automatically adapt to current workloads, dynamic distributed computing environments, and performance as well as dependability requirements of users and applications.

Topics of Interest

  • Resource management, scheduling, data and task placement, system configuration
  • Profiling, performance modeling, testing, simulations, testbeds
  • Distributed data-parallel processing, scalable batch and stream processing, distributed dataflow systems
  • Cluster infrastructures, virtual resources, heterogeneous hardware, real-time operating systems, networked embedded devices, sensor networks
  • Big data analytics, internet of things, urban infrastructure applications
  • Quality of service, efficiency, scalability, dependability, fault tolerance, usability

Research Methodology

We mostly do empirical systems research. Therefore, we evaluate new ideas by implementing them prototypically in context of relevant open-source systems (such as Flink, YARN, Kubernetes, and FreeRTOS) and then conduct experiments on actual hardware, with exemplary applications, and real-world input data. For this, we have access to state-of-the-art infrastructures, including a 200-nodes commodity cluster, a GPU cluster, our faculty's HPC cluster, private and public clouds, as well as IoT devices and sensors. That is, as far as possible, we empirically evaluate new ideas in their actual environments, making use of emulations and simulations only to be able to investigate more and more large-scale scenarios than physically feasible for us.

At the same time, we also work on practical applications in interdisciplinary projects to experience relevant problems ourselves and, thereby, uncover opportunities for well motivated and impactful research.

Most Recent Publications

Renner, Thomas and Thamsen, Lauritz and Kao, Odej (2015). Network-Aware Resource Management for Scalable Data Analytics Frameworks. Proceedings of the 1st First Workshop on Data-Centric Infrastructure for Big Data Science (DIBS), co-located with the 2015 IEEE International Conference on BigData. IEEE, 2793–2800.


Felgentreff, Tim and Lincke, Jens and Hirschfeld, Robert and Thamsen, Lauritz (2015). Lively Groups: Shared Behavior in a World of Objects without Classes or Prototypes. In Proceedings of the Future Programming Workshop (FPW) 2015, co-located with the Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA). ACM, 15–22.


Thamsen, Lauritz and Steinert, Bastian and Hirschfeld, Robert (2015). Preserving Access to Previous System States in the Lively Kernel. Design Thinking Research. Springer, 235-264.


Navigation

Quick Access

Schnellnavigation zur Seite über Nummerneingabe