IT Log anomaly detection model generalization on inference stage
The robustness of distributed IT systems is becoming increasingly important. The widespread of IoT-related scenarios will substantially increase this demand in future, as latency and reliability will be a key issue in many applications related to smart cities, production 4.0, autonomous worlds and others. Such systems are becoming more and more complex and therefore human experts will no longer be able to monitor and operate all components manually. Likewise, new components are constantly connected to the system or new versions and patches are installed. All these changes require new monitoring and detection methods. Due to these influences, a wide variety of anomalies can occur which must be detected. Therefore, automatic anomaly detection is of central importance for the reliability of those systems.
This research seeks to explore an end-to-end method for detecting anomalies based on logs within a variety of IT environments. This end-to-end process aims at the simplification of the monitoring and operation of the systems. Thereby, the first step is to analyze and process the logs. The goal of log processing is to recognize which logs are of the same type even if they contain different variable parts, such as IP addresses. Since the systems are heterogeneous, a wide variety of logs can occur, therefore the template generation is not that easy. Thus, it is essential to create log templates as accurately as possible without manual effort. Only with good template generation algorithms, anomalies can be accurately predicted. Various anomalies can occur, so it is important to work with unsupervised learning and to create a precise model of normal behaviour.
Our target is to realize a general solution for anomaly detection in unknown systems. This procedure should enable unknown logs extraction and anomaly detection without the need for manpower to re-label the anomaly. Unsupervised anomaly detection will automatically restore the accuracy of the base model after retraining the algorithm.
To achieve this goal, we are researching the applicability of NLP procedures. For this purpose, the individual components of a logline are converted in a specific manner. This way the log lines can be made understandable for machines. This should ensure that the log lines are recognizable and identifiable. This allows you to add a common id to the same log types. Now it is possible to create sequences of these Id's to learn the anomalies from the log lines in a sequence-based way. Therefore, only sequences in a normal environment will be learned to get an exact statistical model of the normal state. The anomalies can then be detected by deviations from this model.
To this end, we investigate publicly available data and also the data from our research partner. We have also set ourselves the goal of achieving an accuracy of at least 80 % and a recall of 90 %. Since there are uncertainties in the log extraction and the sequence analysis, it will be shown that we will reach this goal well.