8.3 Tasks: Troubleshoot Kubernetes Service Discovery
Task 8.3.1: Troubleshooting Kubernetes Service Discovery
We will now deploy an application with an error in the monitoring configration.
Deploy Loki in the monitoring namespace.
Create a deployment training_loki-deployment.yaml.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: loki
name: example-loki
spec:
replicas: 1
selector:
matchLabels:
app: loki
template:
metadata:
labels:
app: loki
spec:
containers:
- image: quay.balgroupit.com/acend/loki
imagePullPolicy: Always
name: loki
volumeMounts:
- mountPath: /loki/chunks
name: chunks
- mountPath: /loki/boltdb-shipper-cache
name: boltdb-shipper-cache
- mountPath: /loki/boltdb-shipper-active
name: shipper-active
- mountPath: /loki/wal
name: wal
- mountPath: /loki/compactor
name: compactor
restartPolicy: Always
volumes:
- name: chunks
emptyDir: {}
- name: boltdb-shipper-cache
emptyDir: {}
- name: shipper-active
emptyDir: {}
- name: wal
emptyDir: {}
- name: compactor
emptyDir: {}Create a Service training_service-loki.yaml.
apiVersion: v1
kind: Service
metadata:
name: loki
labels:
app: loki
spec:
ports:
- name: http
port: 3100
protocol: TCP
targetPort: 3100
selector:
app: loki
type: ClusterIPCreate the Loki ServiceMonitor training_servicemonitor-loki.yaml.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/name: loki
name: loki
spec:
endpoints:
- interval: 30s
port: http
scheme: http
path: /metrics
selector:
matchLabels:
prometheus-monitoring: 'true'
- When you visit the Prometheus user interface you will notice that the Prometheus Server does not scrape metrics from Loki. Try to find out why.
Troubleshooting: Prometheus is not scraping metrics
The cause that Prometheus is not able to scrape metrics is usually one of the following:
- The configuration defined in the ServiceMonitor does not appear in the Prometheus scrape configuration.
- Check if the label of your ServiceMonitor matches the label defined in the
serviceMonitorSelectorfield of the Prometheus custom resource - Check the Prometheus operator logs for errors (permission issues or invalid ServiceMonitors)
- Check if the label of your ServiceMonitor matches the label defined in the
- The Endpoint appears in the Prometheus scrape config but not under targets.
- The namespaceSelector in the ServiceMonitor does not match the namespace of your app
- The label selector does not match the Service of your app
- The port name does not match the Service of your app
- The Endpoint appears as a Prometheus target, but no data gets scraped.
- The application does not provide metrics under the correct path and port
- Networking issues
- Authentication required, but not configured
Hints
The quickest way to do this is to follow the instructions in the info box above. So let’s first find out which of the following statements apply to us:
- The configuration defined in the ServiceMonitor does not appear in the Prometheus scrape configuration.
- Let’s check if Prometheus reads the configuration defined in the ServiceMonitor resource. To do so, navigate to Prometheus configuration
and search if
lokiappears in the scrape_configuration. You should find a job with the nameserviceMonitor/loki/loki/0, therefore this should not be the issue in this case.
- Let’s check if Prometheus reads the configuration defined in the ServiceMonitor resource. To do so, navigate to Prometheus configuration
and search if
- The Endpoint appears in the Prometheus configuration
but not under targets.
Let’s check if the application is running:
oc -n <team>-monitoring get pod -l app=lokiThe output should be similar to the following:
NAME READY STATUS RESTARTS AGE example-loki-7bb486b647-dj5r4 1/1 Running 0 112sLets check if the application is exposing metrics:
PODNAME=$(oc -n <team>-monitoring get pod -l app=loki -o name) oc -n <team>-monitoring exec $PODNAME -it -- wget -O - localhost:3100/metrics ...The application exposes metrics and Prometheus generated the configuration according to the defined ServiceMonitor. Let’s verify, if the ServiceMonitor matches the Service.
oc -n <team>-monitoring get svc loki -o yamlapiVersion: v1 kind: Service metadata: ... labels: app: loki argocd.argoproj.io/instance: ... name: loki spec: ... ports: - name: http ...We see that the Service has the port named
httpand the labelapp: lokiset. Let’s check the ServiceMonitor:oc -n <team>-monitoring get servicemonitor loki -o yamlapiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor ... spec: endpoints: - interval: 30s ... port: http ... selector: matchLabels: prometheus-monitoring: "true"We see that the ServiceMonitor expect the port named
httpand a labelprometheus-monitoring: "true"set. So the culprit is the missing label. Let’s set the label on the Service by updating the the servicetraining_service-loki.yaml.apiVersion: v1 kind: Service metadata: name: loki labels: app: loki prometheus-monitoring: "true" spec: ...Verify that the target gets scraped in the Prometheus user interface .