direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

MT: MT: Design and Development of Methods for Language-agnostic Log Statement Mining from Software Repositories


Modern software development and operation are supported by CI/CD pipelines, which include building blocks such as Code - Build - Test - Release - Deploy - Verify - Monitor. There are numerous automation tools for each of these steps, for example Jenkins orchestrates the sub-steps of a CI/CD pipeline, SonarQube executes a static code analysis, Maven or Gradle take over the management of the builds. All these components generate data log messages with insightful information about the quality of software code, weak points in the deployment, or runtime errors. Often there are more than 100,000 messages per version and per update, which are examined by the DevOps - mostly manually and time-consuming with tools such as grep or awk - in search of errors. A significant acceleration is achieved through the use of so-called AI4DevOps tools: AI models are trained using sample data from open-source projects in such a way that they search through the log messages from the CI/CD pipeline and present those messages that are likely pointing  to the most serious errors to the DevOps.

The announced master's thesis builds upon an existing AI model and aims at developing a concept and a prototypical implementation of a language-agnostic log message mining from software repositories. Such code repositories (for example github) contain numerous open source projects providing valuable input for the training of AI-models. As manual search and labeling is way too time-consuming, the goal of this thesis is to automate the process by finding log messages in the source code of the contained projects (file and line number), extracting and storing the log message into a structured output. Ther results will be used for a training of AI-enabled tools for anomaly and incident detection. Methods from natural language processing (NLP) as well as typical RegEx search tools serve as a starting point for the analysis of the source codes. The quality of the developed methods should be evaluated in terms of precision, recall as well as compute-efficient processing. 

Requirements: Knowledge of software development processes, distributed systems, CI/CD, python, machine learning, DevOps patterns. Desirable is advanced Python knowledge as well as experience in Pytorch/Tensorflow, Kotlin/Java

Start: immediately

Contact: Prof. Dr. Odej Kao (odej.kao@tu-berlin.de)

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions


Odej Kao
+49 30 314-25154 (Sekr.)
Room TEL 1206/7