1. Setting up Prometheus

In this first section we are going to eplore the already set up Prometheus stack. Each trainee will have their own stack to work with.

How do metrics end up in Prometheus?

Since Prometheus is a pull-based monitoring system, the Prometheus server maintains a set of targets to scrape. This set can be configured using the scrape_configs option in the Prometheus configuration file. The scrape_configs consist of a list of jobs defining the targets as well as additional parameters (path, port, authentication, etc.) which are required to scrape these targets. As we will be using the Prometheus Operator on Kubernetes, we will never actually touch this configuration file by ourselves. Instead, we rely on the abstractions provided by the Operator, which we will look at closer in the next section.

There are two basic types of targets that we can add to our Prometheus server:

Static targets

In this case, we define one or more targets statically. In order to make changes to the list, you need to change the configuration file. As the name implies, this way of defining targets is inflexible and not suited to monitor workloads inside of Kubernetes as these are highly dynamic.

We will use this type of configuration in the task 2.1 .

Dynamic configuration

Besides the static target configuration, Prometheus provides many ways to dynamically add/remove targets. There are builtin service discovery mechanisms for cloud providers such as AWS, GCP, Hetzner, and many more. In addition, there are more versatile discovery mechanisms available which allow you to implement Prometheus in your environment (e.g. DNS service discovery or file service discovery). Most importantly, the Prometheus Operator makes it very easy to let Prometheus discover targets dynamically using the Kubernetes API.

Prometheus Operator

The Prometheus Operator is the preferred way of running Prometheus inside of a Kubernetes Cluster. In the following labs you will get to know its CustomResources in more detail, which are the following:

Prometheus : Manage the Prometheus instances
Alertmanager : Manage the Alertmanager instances
ServiceMonitor : Generate Kubernetes service discovery scrape configuration based on Kubernetes service definitions
PrometheusRule : Manage the Prometheus rules of your Prometheus
AlertmanagerConfig : Add additional receivers and routes to your existing Alertmanager configuration
PodMonitor : Generate Kubernetes service discovery scrape configuration based on Kubernetes pod definitions
Probe : Manage Prometheus blackbox exporter targets
ThanosRuler : Manage Thanos rulers

Service Discovery

When configuring Prometheus to scrape metrics from containers deployed in a Kubernetes Cluster it doesn’t really make sense to configure every single target (Pod) manually. That would be far too static and wouldn’t really work in a highly dynamic environment. A container platform is too dynamic. Pods can be scaled, the names are random and so on.

In fact, we tightly integrate Prometheus with Kubernetes and let Prometheus discover the targets, which need to be scraped, automatically via the Kubernetes API.

The tight integration between Prometheus and Kubernetes can be configured with the Kubernetes Service Discovery Config .

The way we instruct Prometheus to scrape metrics from an application running as a Pod is by creating a ServiceMonitor.

ServiceMonitors are Kubernetes custom resources, which look like this:

# just an example
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: example-web-python
  name: example-web-python-monitor
spec:
  endpoints:
  - interval: 30s
    port: http
    scheme: http
    path: /metrics
  selector:
    matchLabels:
      prometheus-monitoring: 'true'

How does it work

The Prometheus Operator watches namespaces for ServiceMonitor custom resources. It then updates the Service Discovery configuration of the Prometheus server(s) accordingly.

The selector part in the ServiceMonitor defines which Kubernetes Services will be scraped. Here we are selecting the correct service by defining a selector on the label prometheus-monitoring: 'true'.

# servicemonitor.yaml
...
  selector:
    matchLabels:
      prometheus-monitoring: 'true'
...

The corresponding Service needs to have this label set:

apiVersion: v1
kind: Service
metadata:
  name: example-web-python
  labels:
    prometheus-monitoring: 'true'
...

The Prometheus Operator then determines all Endpoints(which are basically the IPs of the Pods) that belong to this Service using the Kubernetes API. The Endpoints are then dynamically added as targets to the Prometheus server(s).

The spec section in the ServiceMonitor resource allows further configuration on how to scrape the targets. In our case Prometheus will scrape:

Every 30 seconds
Look for a port with the name http (this must match the name in the Service resource)
Scrape metrics from the path /metrics using http

Best practices

Use the common k8s labels https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/

If possible, reduce the number of different ServiceMonitors for an application and thereby reduce the overall complexity.

Use the same matchLabels on different Services for your application (e.g. Frontend Service, Backend Service, Database Service)
Also make sure the ports of different Services have the same name
Expose your metrics under the same path

Add your application as monitoring target at Baloise

Have a look at the Add Monitoring Targets outside of OpenShift documentation. There are two ways to add machines outside of OpenShift to your monitoring stack.

Using File Service Discovery you have the following options (lab 2.1 )
- Add targets using TLS and using the default credentials provided
- Add targets without TLS and authentication
You can use the approach with ServiceMonitors, which provides more flexibility for cases like
- custom targets with non standard basic authentication
- custom targets with non TLS and non standard basic authentication
- provide ca to verify custom certificate on the exporter side
- define a non default scrape_interval

1.1 Tasks: Setup

Last modified April 29, 2026: Merge 16c2cf49af91125a95fbffd0197a473293446d89 into bab7e50cc1efa022e59121d708c7de54c6d4d4ce (1ef1573)