8.3 Tasks: Troubleshoot Kubernetes Service Discovery

Task 8.3.1: Troubleshooting Kubernetes Service Discovery

We will now deploy an application with an error in the monitoring configration.

Deploy Loki in the monitoring namespace.

Create a deployment training_loki-deployment.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: loki
  name: example-loki
spec:
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      labels:
        app: loki
    spec:
      containers:
      - image: quay.balgroupit.com/acend/loki
        imagePullPolicy: Always
        name: loki
        volumeMounts:
        - mountPath: /loki/chunks
          name: chunks
        - mountPath: /loki/boltdb-shipper-cache
          name: boltdb-shipper-cache
        - mountPath: /loki/boltdb-shipper-active
          name: shipper-active
        - mountPath: /loki/wal
          name: wal
        - mountPath: /loki/compactor
          name: compactor
      restartPolicy: Always
      volumes:
      - name: chunks
        emptyDir: {}
      - name: boltdb-shipper-cache
        emptyDir: {}
      - name: shipper-active
        emptyDir: {}
      - name: wal
        emptyDir: {}
      - name: compactor
        emptyDir: {}

Create a Service training_service-loki.yaml.

apiVersion: v1
kind: Service
metadata:
  name: loki
  labels:
    app: loki
spec:
  ports:
    - name: http
      port: 3100
      protocol: TCP
      targetPort: 3100
  selector:
    app: loki
  type: ClusterIP

Create the Loki ServiceMonitor training_servicemonitor-loki.yaml.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: loki
  name: loki
spec:
  endpoints:
    - interval: 30s
      port: http
      scheme: http
      path: /metrics
  selector:
    matchLabels:
      prometheus-monitoring: 'true'
  • When you visit the Prometheus user interface you will notice that the Prometheus Server does not scrape metrics from Loki. Try to find out why.
Hints

The quickest way to do this is to follow the instructions in the info box above. So let’s first find out which of the following statements apply to us:

  • The configuration defined in the ServiceMonitor does not appear in the Prometheus scrape configuration.
    • Let’s check if Prometheus reads the configuration defined in the ServiceMonitor resource. To do so, navigate to Prometheus configuration and search if loki appears in the scrape_configuration. You should find a job with the name serviceMonitor/loki/loki/0, therefore this should not be the issue in this case.
  • The Endpoint appears in the Prometheus configuration but not under targets.
    • Let’s check if the application is running:

      oc -n <team>-monitoring get pod -l app=loki
      

      The output should be similar to the following:

      NAME                    READY   STATUS    RESTARTS   AGE
      example-loki-7bb486b647-dj5r4          1/1     Running   0             112s
      
    • Lets check if the application is exposing metrics:

      PODNAME=$(oc -n <team>-monitoring get pod -l app=loki -o name)
      oc -n <team>-monitoring exec $PODNAME -it -- wget -O - localhost:3100/metrics
      ...
      
    • The application exposes metrics and Prometheus generated the configuration according to the defined ServiceMonitor. Let’s verify, if the ServiceMonitor matches the Service.

      oc -n <team>-monitoring get svc loki -o yaml
      
      apiVersion: v1
      kind: Service
      metadata:
        ...
        labels:
          app: loki
          argocd.argoproj.io/instance: ...
        name: loki
      spec:
        ...
        ports:
        - name: http
          ...
      

      We see that the Service has the port named http and the label app: loki set. Let’s check the ServiceMonitor:

      oc -n <team>-monitoring get servicemonitor loki -o yaml
      
      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      ...
      spec:
        endpoints:
        - interval: 30s
          ...
          port: http
          ...
        selector:
          matchLabels:
            prometheus-monitoring: "true"
      

      We see that the ServiceMonitor expect the port named http and a label prometheus-monitoring: "true" set. So the culprit is the missing label. Let’s set the label on the Service by updating the the service training_service-loki.yaml.

      apiVersion: v1
      kind: Service
      metadata:
        name: loki
        labels:
          app: loki
          prometheus-monitoring: "true"
      spec:
      ...
      

      Verify that the target gets scraped in the Prometheus user interface .