Anomaly Classification and Auto Remediation
We are living in a world where each of us is progressively relying on a variety of digital services. Be it online calendars to manage daily schedules, social networks to stay in contact with our friends and family, cloud storage to arrange documents, photos or videos and many other applications that became an integral part of our life. Furthermore, we gradually became used to the fact that these services are constantly available to us. Whenever online services with large user bases went down - even for a comparably short amount of time - the result is a huge medial response and a significant economic impact on the respective company.
To bring innovative new features to their customers and thus, stay ahead of their competitors' companies are constantly improving their products. However, novel technologies like virtual or augmented reality VR/AR, autonomous mobility or remote medical applications require a complex interconnection of different software and hardware systems. This increased system complexity together with the overall increasing demand for online services, in general, are considered to be major future challenges.
One possible solution to balance the demands of always-available services cost overhead for companies and the increasing complexity is the utilization of methods from the area of artificial intelligence (AI) and machine learning (ML). They should support experts (system administrators, site reliability engineers (SREs) or network reliability engineers NREs) to manage the operation of these systems. Simply explained, occurring problems within complex system environments should be detected - ideally before users get aware of them - and automatically remediated by AI/ML methods. Especially routine tasks but perspectively also complex decision processes should be handled by machines instead of human experts.
This project especially aims at the remediation of recurring problems, which means that routine tasks of fixing those should be taken off from system administrators and allowing them to work on meaningful and interesting projects instead.