direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

ZerOps - A Self-Healing Platform

Telecommunication service and network operators are confronted with rising expectations towards availability, performance, and guaranteed QoS. The complexity of modern IT infrastructures has increased to a point, where traditional IT administration procedures fail to holistically ensure the dependability of the systems.

At the same time, various approaches around artificial intelligence (AI) are currently revolutionizing domains like medicine, manufacturing, or autonomous driving. This strongly motivates the utilization of AI for the autonomous management of highly complex IT systems (AIOps).

Researchers and global companies recognized this potential and started to work on AIOps solutions. Since 2015, the CIT department joint forces with industrial partners (Deutsche Telecom and Huawei Technologies Co., Ltd) to establish a joint research lab, working on solutions for anomaly detection/classification, predictive fault tolerance and auto-remediation. Thereby, the Self-Healing Cloud Platform ZerOps was developed, which is up to the current point constantly adjusted and enhanced by members of the CIT group.

Architecture

Lupe

The vision for ZerOps is to provide a scalable platform for monitoring, hierarchical in-place data analytics, and predictive system remediation. The term in-place refers to the explicit design goal to analyze collected data directly at the data source through streaming-based machine learning (ML) algorithms. ZerOps can be integrated in existing cloud infrastructures with. The second major design goal of ZerOps is a modular and flexible data analysis pipeline that can be assembled from multiple interchangeable elements. This allows customization to different infrastructure use cases, but also supports easy-to-use experimentation with new algorithmic approaches for research purposes. Due to the decentralized deployment, the data analysis is co-located with regular system parts. Therefore, its resource usage has to be limited to a certain percentage of the available resources. Furthermore, ZerOps incorporates streaming analytics as well as event aggregations to determine anomaly root causes and perform further advanced anomaly situation analyses. By the integration of unsupervised anomaly detection, ZerOps is able to detect unknown problems as well as already known and learned anomalies. A decentralized ML model repository enables transfer learning to overcome cold-start problems for dynamic IT-infrastructure components. ZerOps also supports automatic hyperparameter selection of ML algorithms.

Related Publications

A

Online Density Grid Pattern Analysis to Classify Anomalies in Cloud and NFV Systems

Acker, Alexander and Schmidt, Florian and Gulenko, Anton and Kao, Odej

2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE. 2018

Download Bibtex entry

Superiority of Simplicity: A Lightweight Model for Network Device Workload Prediction

Acker, Alexander and Wittkopp, Thorsten and Nedelkoski, Sasho and Bogatinovski, Jasmin and Kao, Odej

15th Conference on Computer Science and Information Systems, To appear. 2020

Link to publication Link to code repository Download Bibtex entry

G

Evaluating machine learning algorithms for anomaly detection in clouds

Gulenko, Anton and Wallschläger, Marcel and Schmidt, Florian and Kao, Odej and Liu, Feng

Big Data (Big Data), 2016 IEEE International Conference on, 2716–2721. 2016

Download Bibtex entry

Detecting Anomalous Behavior of Black-Box Services Modeled with Distance-Based Online Clustering

Gulenko, Anton and Schmidt, Florian and Acker, Alexander and Wallschlager, Marcel and Kao, Odej and Liu, Feng

2018 IEEE 11th International Conference on Cloud Computing (CLOUD), 912–915. 2018

Download Bibtex entry

A Practical Implementation of In-Band Network Telemetry in Open vSwitch

Gulenko, Anton and Wallschläger, Marcel and Kao, Odej

2018 7th IEEE International Conference on Cloud Networking (CloudNet). IEEE. 2018

Download Bibtex entry

Anomaly Detection and Levels of Automation for AI-Supported System Administration

Gulenko, Anton and Kao, Odej and Schmidt, Florian

Annual International Symposium on Information Management and Big Data, 1–7. 2019

Download Bibtex entry

Bitflow: An In Situ Stream Processing Framework

Gulenko, Anton and Acker, Alexander and Schmidt, Florian and Becker, Soeren and Kao, Odej

International Conference on Autonomic Computing and Self-Organizing Systems, To appear. 2020

Link to code repository Download Bibtex entry

AI-Governance and Levels of Automation for AIOps-supported system administration

Gulenko, Anton and Acker, Alexander and Kao, Odej and Liu, Feng

The 29th International Conference on Computer Communications and Networks, To appear. 2020

Download Bibtex entry

A System Architecture for Real-time Anomaly Detection in Large-scale NFV Systems

Gulenko, Anton and Wallschläger, Marcel and Schmidt, Florian and Kao, Odej and Liu, Feng

Procedia Computer Science. Elsevier} volume = {94, 491–496. 2016

Link to publication Download Bibtex entry

L

MicroRAS: Automatic Recovery in the Absence of Historical Failure Data for Microservice Systems

Li Wu, Johan Tordsson, Alexander Acker and Odej Kao

UCC 2020: 13th IEEE/ACM International Conference on Utility and Cloud Computing, To appear. 2020

Download Bibtex entry

M

High available deployment of cloud-based virtualized network functions

Makhsous, Saeed Haddadi and Gulenko, Anton and Kao, Odej and Liu, Feng

High Performance Computing & Simulation (HPCS), 2016 International Conference on, 468–475. 2016

Download Bibtex entry

N

Multilayer Active Learning for Efficient Learning and Resource Usage in Distributed IoT Architectures

Nedelkoski, Sasho and Thamsen, Lauritz and Verbitskiy, Ilya and Kao, Odej

2019 IEEE International Conference on Edge Computing (EDGE). IEEE, 8-12. 2019

Download Bibtex entry

Multi-Source Distributed System Data for AI-powered Analytics

Nedelkoski, Sasho and Bogatinovski, Jasmin and Mandapati, Ajay, and Cardoso, Jorge and Kao, Odej

ESOCC 2020: European Conference On Service-Oriented And Cloud Computing. Springer, To appear. 2020

Link to code repository Download Bibtex entry

Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs

Nedelkoski, Sasho and Bogatinovski, Jasmin and Acker, Alexander and Cardoso, Jorge and Kao, Odej

ICDM 2020: 20th IEEE International Conference on Data Mining, To appear. 2020

Download Bibtex entry

S

TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services

Dominik Scheinert and Alexander Acker

18th International Conference on Service-Oriented Computing, To appear. 2020

Download Bibtex entry

IFTM-Unsupervised Anomaly Detection for Virtualized Network Function Services

Schmidt, Florian and Gulenko, Anton and Wallschläger, Marcel and Acker, Alexander and Hennig, Vincent and Liu, Feng and Kao, Odej

2018 IEEE International Conference on Web Services (ICWS), 187–194. 2018

Download Bibtex entry

Unsupervised Anomaly Event Detection for VNF Service Monitoring using Multivariate Online Arima

Schmidt, Florian and Suri-Payer, Florian and Gulenko, Anton and Wallschläger, Marcel and Acker, Alexander and Kao, Odej

2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE. 2018

Download Bibtex entry

Unsupervised Anomaly Event Detection for Cloud Monitoring using Online Arima

Schmidt, Florian and Suri-Payer, Florian and Gulenko, Anton and Wallschläger, Marcel and Acker, Alexander and Kao, Odej

2018 IEEE/ACM International Conference on Utility and Cloud Computing (UCC). IEEE. 2018

Download Bibtex entry

W

Automated Anomaly Detection in Virtualized Services Using Deep Packet Inspection

Wallschläger, Marcel and Gulenko, Anton and Schmidt, Florian and Kao, Odej and Liu, Feng

Procedia Computer Science. Elsevier, 510–515. 2017

Link to publication Download Bibtex entry

Anomaly Detection for Black Box Services in Edge Clouds Using Packet Size Distribution

Wallschläger, Marcel and Gulenko, Anton and Schmidt, Florian and Acker, Alexander and Kao, Odej

2018 7th IEEE International Conference on Cloud Networking (CloudNet). IEEE. 2018

Download Bibtex entry

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe