Root Cause Localization and Automatic Recovery for Microservices
An increasing number of applications have started to adopt microservices architectures (MSA) in domains such as internet of things (IoT), cloud computing and fog computing, to build large-scale systems that are more resilient, robust and better adapted to dynamic customer requirements.
To operate microservices reliably and with high uptime, it is very important to identify the root causes and recover the services quickly once abnormal behaviours are detected.
However, it is difficult to achieve this in microservices systems due to the following challenges:
- complex dependencies
- numerous metrics
- frequent updates
- volatile infrastructure
In this project, we will study the following research questions:
- Root cause
localization: How to locate the root cause of performance issues
in microservices ?
(One proposed method: MicroRCA )
- Automatic recovery: Once root cause identified, what action should be taken to recover the performance degradation with no/minimum SLA violation. (On going )
- Extension to fog computing: In a geographical distributed, resource-constrained, network unreliable fog computing environment, how could we apply the approaches in cloud to it ?
MicroRCA root cause localization procedures