8.3 Tasks: Troubleshoot Kubernetes Service Discovery

Task 8.3.1: Troubleshooting Kubernetes Service Discovery

We will now deploy an application with an error in the monitoring configration.

Deploy Loki in the monitoring namespace.

Create a deployment training_loki-deployment.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: loki
  name: example-loki
spec:
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      labels:
        app: loki
    spec:
      containers:
      - image: quay.balgroupit.com/acend/loki
        imagePullPolicy: Always
        name: loki
        volumeMounts:
        - mountPath: /loki/chunks
          name: chunks
        - mountPath: /loki/boltdb-shipper-cache
          name: boltdb-shipper-cache
        - mountPath: /loki/boltdb-shipper-active
          name: shipper-active
        - mountPath: /loki/wal
          name: wal
        - mountPath: /loki/compactor
          name: compactor
      restartPolicy: Always
      volumes:
      - name: chunks
        emptyDir: {}
      - name: boltdb-shipper-cache
        emptyDir: {}
      - name: shipper-active
        emptyDir: {}
      - name: wal
        emptyDir: {}
      - name: compactor
        emptyDir: {}

Create a Service training_service-loki.yaml.

apiVersion: v1
kind: Service
metadata:
  name: loki
  labels:
    app: loki
spec:
  ports:
    - name: http
      port: 3100
      protocol: TCP
      targetPort: 3100
  selector:
    app: loki
  type: ClusterIP

Create the Loki ServiceMonitor training_servicemonitor-loki.yaml.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: loki
  name: loki
spec:
  endpoints:
    - interval: 30s
      port: http
      scheme: http
      path: /metrics
  selector:
    matchLabels:
      prometheus-monitoring: 'true'

When you visit the Prometheus user interface you will notice that the Prometheus Server does not scrape metrics from Loki. Try to find out why.

Troubleshooting: Prometheus is not scraping metrics

The cause that Prometheus is not able to scrape metrics is usually one of the following:

The configuration defined in the ServiceMonitor does not appear in the Prometheus scrape configuration.
- Check if the label of your ServiceMonitor matches the label defined in the serviceMonitorSelector field of the Prometheus custom resource
- Check the Prometheus operator logs for errors (permission issues or invalid ServiceMonitors)
The Endpoint appears in the Prometheus scrape config but not under targets.
- The namespaceSelector in the ServiceMonitor does not match the namespace of your app
- The label selector does not match the Service of your app
- The port name does not match the Service of your app
The Endpoint appears as a Prometheus target, but no data gets scraped.
- The application does not provide metrics under the correct path and port
- Networking issues
- Authentication required, but not configured

Hints

The quickest way to do this is to follow the instructions in the info box above. So let’s first find out which of the following statements apply to us:

The configuration defined in the ServiceMonitor does not appear in the Prometheus scrape configuration.
- Let’s check if Prometheus reads the configuration defined in the ServiceMonitor resource. To do so, navigate to Prometheus configuration and search if loki appears in the scrape_configuration. You should find a job with the name serviceMonitor/loki/loki/0, therefore this should not be the issue in this case.

The Endpoint appears in the Prometheus configuration but not under targets.

Let’s check if the application is running:

oc -n <team>-monitoring get pod -l app=loki

The output should be similar to the following:

NAME                    READY   STATUS    RESTARTS   AGE
example-loki-7bb486b647-dj5r4          1/1     Running   0             112s

Lets check if the application is exposing metrics:

PODNAME=$(oc -n <team>-monitoring get pod -l app=loki -o name)
oc -n <team>-monitoring exec $PODNAME -it -- wget -O - localhost:3100/metrics
...

The application exposes metrics and Prometheus generated the configuration according to the defined ServiceMonitor. Let’s verify, if the ServiceMonitor matches the Service.

oc -n <team>-monitoring get svc loki -o yaml

apiVersion: v1
kind: Service
metadata:
  ...
  labels:
    app: loki
    argocd.argoproj.io/instance: ...
  name: loki
spec:
  ...
  ports:
  - name: http
    ...

We see that the Service has the port named http and the label app: loki set. Let’s check the ServiceMonitor:

oc -n <team>-monitoring get servicemonitor loki -o yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
...
spec:
  endpoints:
  - interval: 30s
    ...
    port: http
    ...
  selector:
    matchLabels:
      prometheus-monitoring: "true"

We see that the ServiceMonitor expect the port named http and a label prometheus-monitoring: "true" set. So the culprit is the missing label. Let’s set the label on the Service by updating the the service training_service-loki.yaml.

apiVersion: v1
kind: Service
metadata:
  name: loki
  labels:
    app: loki
    prometheus-monitoring: "true"
spec:
...

Verify that the target gets scraped in the Prometheus user interface .

Last modified April 29, 2026: Merge 16c2cf49af91125a95fbffd0197a473293446d89 into bab7e50cc1efa022e59121d708c7de54c6d4d4ce (1ef1573)