1. Setting up Prometheus
In this first section we are going to eplore the already set up Prometheus stack. Each trainee will have their own stack to work with.
How do metrics end up in Prometheus?
Since Prometheus is a pull-based monitoring system, the Prometheus server maintains a set of targets to scrape. This set can be configured using the scrape_configs option in the Prometheus configuration file. The scrape_configs consist of a list of jobs defining the targets as well as additional parameters (path, port, authentication, etc.) which are required to scrape these targets. As we will be using the Prometheus Operator on Kubernetes, we will never actually touch this configuration file by ourselves. Instead, we rely on the abstractions provided by the Operator, which we will look at closer in the next section.
There are two basic types of targets that we can add to our Prometheus server:
Static targets
In this case, we define one or more targets statically. In order to make changes to the list, you need to change the configuration file. As the name implies, this way of defining targets is inflexible and not suited to monitor workloads inside of Kubernetes as these are highly dynamic.
We will use this type of configuration in the task 2.1 .
Dynamic configuration
Besides the static target configuration, Prometheus provides many ways to dynamically add/remove targets. There are builtin service discovery mechanisms for cloud providers such as AWS, GCP, Hetzner, and many more. In addition, there are more versatile discovery mechanisms available which allow you to implement Prometheus in your environment (e.g. DNS service discovery or file service discovery). Most importantly, the Prometheus Operator makes it very easy to let Prometheus discover targets dynamically using the Kubernetes API.
Prometheus Operator
The Prometheus Operator is the preferred way of running Prometheus inside of a Kubernetes Cluster. In the following labs you will get to know its CustomResources in more detail, which are the following:
- Prometheus : Manage the Prometheus instances
- Alertmanager : Manage the Alertmanager instances
- ServiceMonitor : Generate Kubernetes service discovery scrape configuration based on Kubernetes service definitions
- PrometheusRule : Manage the Prometheus rules of your Prometheus
- AlertmanagerConfig : Add additional receivers and routes to your existing Alertmanager configuration
- PodMonitor : Generate Kubernetes service discovery scrape configuration based on Kubernetes pod definitions
- Probe : Manage Prometheus blackbox exporter targets
- ThanosRuler : Manage Thanos rulers
Service Discovery
When configuring Prometheus to scrape metrics from containers deployed in a Kubernetes Cluster it doesn’t really make sense to configure every single target (Pod) manually. That would be far too static and wouldn’t really work in a highly dynamic environment. A container platform is too dynamic. Pods can be scaled, the names are random and so on.
In fact, we tightly integrate Prometheus with Kubernetes and let Prometheus discover the targets, which need to be scraped, automatically via the Kubernetes API.
The tight integration between Prometheus and Kubernetes can be configured with the Kubernetes Service Discovery Config .
The way we instruct Prometheus to scrape metrics from an application running as a Pod is by creating a ServiceMonitor.
ServiceMonitors are Kubernetes custom resources, which look like this:
# just an example
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/name: example-web-python
name: example-web-python-monitor
spec:
endpoints:
- interval: 30s
port: http
scheme: http
path: /metrics
selector:
matchLabels:
prometheus-monitoring: 'true'
How does it work
The Prometheus Operator watches namespaces for ServiceMonitor custom resources. It then updates the Service Discovery configuration of the Prometheus server(s) accordingly.
The selector part in the ServiceMonitor defines which Kubernetes Services will be scraped. Here we are selecting the correct service by defining a selector on the label prometheus-monitoring: 'true'.
# servicemonitor.yaml
...
selector:
matchLabels:
prometheus-monitoring: 'true'
...
The corresponding Service needs to have this label set:
apiVersion: v1
kind: Service
metadata:
name: example-web-python
labels:
prometheus-monitoring: 'true'
...
The Prometheus Operator then determines all Endpoints(which are basically the IPs of the Pods) that belong to this Service using the Kubernetes API. The Endpoints are then dynamically added as targets to the Prometheus server(s).
The spec section in the ServiceMonitor resource allows further configuration on how to scrape the targets.
In our case Prometheus will scrape:
- Every 30 seconds
- Look for a port with the name
http(this must match the name in theServiceresource) - Scrape metrics from the path
/metricsusinghttp
Best practices
Use the common k8s labels https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
If possible, reduce the number of different ServiceMonitors for an application and thereby reduce the overall complexity.
- Use the same
matchLabelson differentServicesfor your application (e.g. Frontend Service, Backend Service, Database Service) - Also make sure the ports of different
Serviceshave the same name - Expose your metrics under the same path
Add your application as monitoring target at Baloise
Have a look at the Add Monitoring Targets outside of OpenShift documentation. There are two ways to add machines outside of OpenShift to your monitoring stack.
- Using
File Service Discoveryyou have the following options (lab 2.1 )- Add targets using TLS and using the default credentials provided
- Add targets without TLS and authentication
- You can use the approach with
ServiceMonitors, which provides more flexibility for cases like- custom targets with non standard basic authentication
- custom targets with non TLS and non standard basic authentication
- provide ca to verify custom certificate on the exporter side
- define a non default
scrape_interval