TU Berlin

Department of Telecommunication SystemsMT: Design and Development of Methods for Named Entity Recognition in Log Messages

Page Content

to Navigation

MT: Design and Development of Methods for Named Entity Recognition in Log Messages


Modern software development and operation are supported by CI/CD pipelines, which include building blocks such as Code - Build - Test - Release - Deploy - Verify - Monitor. There are numerous automation tools for each of these steps, for example Jenkins orchestrates the sub-steps of a CI/CD pipeline, SonarQube executes a static code analysis, Maven or Gradle take over the management of the builds. All these components generate data log messages with insightful information about the quality of software code, weak points in the deployment, or runtime errors. Often there are more than 100,000 messages per version and per update, which are examined by the DevOps - mostly manually and time-consuming with tools such as grep or awk - in search of errors. A significant acceleration is achieved through the use of so-called AI4DevOps tools: AI models are trained using sample data from open-source projects in such a way that they search through the log messages from the CI/CD pipeline and present those messages that are likely pointing  to the most serious errors to the DevOps.

The announced master's thesis builds upon an existing AI model that was trained using 2000+ GitHub projects and aims at developing a concept and a prototypical implementation for the named entity recognition in log data. The main challenge is to analyze the free-form text artifacts from a software execution using statistical and deep learning methods for natural language processing and identify data types such as classes, functions, IP addresses, timestamps, and other entities included in the text. The extracted information will be then transformed into a structured log message and pipelined into the subsequent components for anomaly and incident detection. 

The quality of the developed methods should be evaluated in the productive environment, taking real use cases and the corresponding log messages into account. 

Requirements: Knowledge of software development processes, distributed systems, CI/CD, python, machine learning, DevOps patterns. Desirable is advanced Python knowledge as well as experience in Pytorch/Tensorflow, Kotlin/Java

Start: immediately

Contact: Prof. Dr. Odej Kao (odej.kao@tu-berlin.de)


Quick Access

Schnellnavigation zur Seite über Nummerneingabe