MA: Adaptive Distributed Processing of IoT Data Streams
Various IoT sensors are increasingly deployed in manufacturing facilities and urban infrastructures, where they continuously record and emit data. The resulting data streams enable numerous interesting analytics applications, from rule-based remote monitoring and unsupervised failure detection to optimizing physical infrastructures based on the current measured state of the systems. This allows “smart” behavior of multiple, even critical, infrastructures such as public transportation systems, energy grids, urban water networks, and telemedicine systems. Sensor platforms are usually widely distributed across these infrastructures, while the data ultimately ends up at central Cloud resources. However, before reaching these central resources, the data streams first go through a number of edge/fog layers, each with increasing resources available, but each also adding latency and typically aggregating streams from more and more sensor platforms. It can offer significant advantages in terms of reduced latencies, saved bandwidth, and overall system scalability to deploy computation tasks directly onto sensor platforms and edge/fog layers, yet the resources available across these layers can also easily be overloaded by more compute-intensive tasks. Furthermore, IoT sensor systems are also very dynamic systems since ingestion rates and therfore loads often vary significantly, failures of individual nodes and connections are the norm, and the behavior and performance of system components can also depend on e.g. remaining energy levels.
Especially in context of critical infrastructures, stream processing applications typically have strictly defined Quality-of-Service requirements with respect to performance (e.g. max latencies), reliability (e.g. exactly-once processing), and availability (e.g. uptime). Given the dynamic nature of the distributed, heterogeneous environments of the Internet of Things, to fulfill such QoS requirements, it becomes necessary to adapt distributed processing systems continuously at runtime to monitored current conditions. Such adaption can potentially involve re-scheduling distributed streaming jobs, migrating individual deployed tasks at runtime, re-configuring fault tolerance mechanisms, dynamically managing sensing rates, as well as changing resource allocations at runtime.
Concrete theses in this area may focus on the following topics: monitoring, modeling and prediction, model training, resource allocation, scheduling and placement, adjustments at runtime, and automatic system configuration. All theses will entail designing a general method, implementing a prototype in the context of existing open source software, and empirically evaluating the developed prototype with multiple experiments.
If this sounds interesting to you, please send me an email with a little bit of background information on yourself, so we can quickly identify a fitting thesis topic together.