Icon

Troubleshooting Grafana Alert: Service not running

A step-by-step guide to troubleshoot a Prometheus target not found alert. The article covers investigation in Portainer, identifying permission faults in the key vault, and resolving the issue by creating an access policy.

By Peter Yates

In this guide, we'll learn how to troubleshoot and resolve an alert triggered by Prometheus and Grafana indicating a missing Prometheus target. Our objective is to investigate this issue using Grafana, Portainer and the Azure portal. Let's delve into the steps to identify and rectify the problem efficiently.

Let's get started.

1
Grafana Alert has come through into Teams
2
Log into Grafana and look at the Up Status dashboard. Click "UpStatus"
3
The dashboard has identified that there is a problem with de131058intmaad

Next, go to Portainer to investigate it. Open up Portainer with the browser.

4
Click "Stacks"
5
Click "Search for a stack ..." and type in the name of the service
6
Click "de131058intmaad"

On the services, it shows "replicated 0 / 1" indicating none of the services have started. By examining a failed task and reviewing the logs, we can identify why the service failed to start.

7
Click here
8
You can view the log. It is complaining that it cant access the keyvault de131058intmesseuwkv

Next, take a look at the key vault. Log in to the Azure portal and search for the instances resource group.

9
Click "de131058intmesseuwrg"

In this resource group, I can see the key vault. Click on the key vault, then choose access policies to identify who has access to the key vault.

10
Click "de131058intmesseuwkv"
11
Click "Access policies"
12
View the list of access

We should typically have 14 records. Currently, the aggregated dealing service lacks an access policy, causing the service to fail. To resolve this, create an access policy for aggregated dealing and restart the services, or redeploy the environment using the Release pipeline.

Troubleshooting Grafana Alert: Service not running