direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

ZerOps - A Self-Healing Platform

Telecommunication service and network operators are confronted with rising expectations towards availability, performance, and guaranteed QoS. The complexity of modern IT infrastructures has increased to a point, where traditional IT administration procedures fail to holistically ensure the dependability of the systems.

At the same time, various approaches around artificial intelligence (AI) are currently revolutionizing domains like medicine, manufacturing, or autonomous driving. This strongly motivates the utilization of AI for the autonomous management of highly complex IT systems (AIOps).

Researchers and global companies recognized this potential and started to work on AIOps solutions. Since 2015, the CIT department joint forces with industrial partners (Deutsche Telecom and Huawei Technologies Co., Ltd) to establish a joint research lab, working on solutions for anomaly detection/classification, predictive fault tolerance and auto-remediation. Thereby, the Self-Healing Cloud Platform ZerOps was developed, which is up to the current point constantly adjusted and enhanced by members of the CIT group.

Architecture

Lupe [1]

The vision for ZerOps is to provide a scalable platform for monitoring, hierarchical in-place data analytics, and predictive system remediation. The term in-place refers to the explicit design goal to analyze collected data directly at the data source through streaming-based machine learning (ML) algorithms. ZerOps can be integrated in existing cloud infrastructures with. The second major design goal of ZerOps is a modular and flexible data analysis pipeline that can be assembled from multiple interchangeable elements. This allows customization to different infrastructure use cases, but also supports easy-to-use experimentation with new algorithmic approaches for research purposes. Due to the decentralized deployment, the data analysis is co-located with regular system parts. Therefore, its resource usage has to be limited to a certain percentage of the available resources. Furthermore, ZerOps incorporates streaming analytics as well as event aggregations to determine anomaly root causes and perform further advanced anomaly situation analyses. By the integration of unsupervised anomaly detection, ZerOps is able to detect unknown problems as well as already known and learned anomalies. A decentralized ML model repository enables transfer learning to overcome cold-start problems for dynamic IT-infrastructure components. ZerOps also supports automatic hyperparameter selection of ML algorithms.

Related Publications

next >> [3]

2022

A2Log: Attentive Augmented Log Anomaly Detection [7]

Wittkopp, Thorsten and Acker, Alexander and Nedelkoski, Sasho and Bogatinovski, Jasmin and Scheinert, Dominik and Fan, Wu and Odej Kao

55th Hawaii International Conference on Systems Science, to appear. 2022

Download Bibtex entry [8]

2021

LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak Supervision [9]

Wittkopp, Thorsten and Wiesner, Philipp and Scheinert, Dominik and Acker, Alexander

19th International Conference on Service-Oriented Computing, to appear. 2021

Download Bibtex entry [10]

A Taxonomy of Anomalies in Log Data

Wittkopp, Thorsten and Wiesner, Philipp and Scheinert, Dominik and Kao, Odej

19th International Conference on Service-Oriented Computing, to appear. 2021

Download Bibtex entry [11]

Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies [12]

Scheinert, Dominik and Acker, Alexander and Thamsen, Lauritz and Geldenhuys, Morgan K. and Kao, Odej

Workshop Proceedings of the 43th International Conference on Software Engineering, 7-12. 2021

Download Bibtex entry [13]

Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper [14]

Jasmin Bogatinovski and Sasho Nedelkoski and Alexander Acker and Florian Schmidt and Thorsten Wittkopp and Soeren Becker and Jorge Cardoso and Odej Kao

2021

Download Bibtex entry [15]

2020

Performance Diagnosis in Cloud Microservices using Deep Learning [16]

Wu, Li and Bogatinovski, Jasmin and Nedelkoski, Sasho and Tordsson, Johan and Kao, Odej

18th International Conference on Service-Oriented Computing, To appear. 2020

Download Bibtex entry [17]

Towards AIOps in Edge Computing Environments [18]

Becker, Soeren and Schmidt, Florian and Gulenko, Anton and Acker, Alexander and Kao, Odej

2020 IEEE International Conference on Big Data. IEEE, 3470–3475. 2020

Download Bibtex entry [19]

TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services [20]

Scheinert, Dominik and Acker, Alexander

18th International Conference on Service-Oriented Computing, 214-227. 2020

Download Bibtex entry [21]

Decentralized Federated Learning Preserves Model and Data Privacy [22]

Thorsten Wittkopp and Alexander Acker

18th International Conference on Service-Oriented Computing, 176–187. 2020

Link to original publication [23] Download Bibtex entry [24]

Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs [25]

Nedelkoski, Sasho and Bogatinovski, Jasmin and Acker, Alexander and Cardoso, Jorge and Kao, Odej

ICDM 2020: 20th IEEE International Conference on Data Mining, 1196–1201. 2020

Link to original publication [26] Download Bibtex entry [27]

Superiority of Simplicity: A Lightweight Model for Network Device Workload Prediction [28]

Acker, Alexander and Wittkopp, Thorsten and Nedelkoski, Sasho and Bogatinovski, Jasmin and Kao, Odej

15th Conference on Computer Science and Information Systems, 7–10. 2020

Link to original publication [29] Link to code repository Download Bibtex entry [30]

Self-Supervised Log parsing [31]

S. Nedelkoski and J. Bogatinovski and A. Acker and J. Cardoso and O. Kao

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2020, 1–742. 2020

Download Bibtex entry [32]

Multi-Source Distributed System Data for AI-powered Analytics [33]

Nedelkoski, Sasho and Bogatinovski, Jasmin and Mandapati, Ajay, and Cardoso, Jorge and Kao, Odej

ESOCC 2020: European Conference On Service-Oriented And Cloud Computing. Springer International Publishing, 161–176. 2020

Link to code repository Download Bibtex entry [34]

AI-Governance and Levels of Automation for AIOps-supported system administration [35]

Gulenko, Anton and Acker, Alexander and Kao, Odej and Liu, Feng

The 29th International Conference on Computer Communications and Networks, 1–6. 2020

Download Bibtex entry [36]

Bitflow: An In Situ Stream Processing Framework [37]

Gulenko, Anton and Acker, Alexander and Schmidt, Florian and Becker, Soeren and Kao, Odej

International Conference on Autonomic Computing and Self-Organizing Systems, 182–187. 2020

Link to code repository Download Bibtex entry [38]

MicroRAS: Automatic Recovery in the Absence of Historical Failure Data for Microservice Systems [39]

Wu, Li and Tordsson, Johan and Acker, Alexander and Kao, Odej

2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC), 227–236. 2020

Download Bibtex entry [40]

Learning more expressive joint distributions in multimodal variational methods [41]

S. Nedelkoski and M. Bogojevski and O. Kao

2020 International Conference on Machine Learning, Optimization, and Data Science, LOD 2020, 137–149. 2020

Download Bibtex entry [42]

MicroRCA: Root Cause Localization of Performance Issues in Microservices [43]

Wu, Li and Tordsson, Johan and Elmroth, Erik and Kao, Odej

NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium, 1–9. 2020

Download Bibtex entry [44]

2019

Unsupervised Anomaly Alerting for IoT-Gateway Monitoring using Adaptive Thresholds and Half-Space Trees [45]

Wetzig, René and Gulenko, Anton and Schmidt, Florian

2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS). IEEE, 161–168. 2019

Download Bibtex entry [46]

Anomaly Detection and Levels of Automation for AI-Supported System Administration [47]

Gulenko, Anton and Kao, Odej and Schmidt, Florian

Annual International Symposium on Information Management and Big Data, 1–7. 2019

Download Bibtex entry [48]

next >> [50]
------ Links: ------

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions

Copyright TU Berlin 2008