7. Alerting with Alertmanager

Installation

Setup

At Baloise the Alertmanger is part of the managed monitoring stack and does not need to be installed. We will have a look at the default configuration in the next chapter.

Configuration in Alertmanager

Alertmanager’s configuration is done using a YAML config file. There are two main sections for configuring how Alertmanager is dispatching alerts: receivers and routing.

Receivers

With a receiver , one or more notifications can be defined. There are different types of notifications types, e.g. mail, webhook, or one of the message platforms like Slack or PagerDuty.

Routing

With routing blocks , a tree of routes and child routes can be defined. Each routing block has a matcher which can match one or several labels of an alert. Per block, one receiver can be specified, or if empty, the default receiver is taken.

amtool

As routing definitions might be very complex and hard to understand, amtool becomes handy as it helps to test the rules. It can also generate test alerts and has even more useful features. More about this in the labs.

Default Configuration

Alertmanager’s configuration is managed by the monitoring stack. Take a look at the default configuration in use at Baloise:

# baloise config
global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: true
  smtp_from: devops@example.com
  smtp_hello: localhost
  smtp_smarthost: smtp.example.com:25
  smtp_require_tls: false
route:
  receiver: default
  group_by:
  - namespace
  - alertname
  continue: false
  routes:
  - receiver: mail-critical
    match_re:
      severity: critical|warning
    continue: true
  - receiver: deadmanswitch
    match_re:
      alertname: DeadMansSwitch
    continue: false
    group_wait: 0s
    group_interval: 5s
    repeat_interval: 1m
  - receiver: teams-critical-prod
    matchers:
    - env="prod"
    - severity="critical"
    continue: false
  - receiver: teams-warning-prod
    matchers:
    - env="prod"
    - severity="warning"
    continue: false
  - receiver: teams-info-prod
    matchers:
    - env="prod"
    continue: false
  - receiver: teams-critical-nonprod
    matchers:
    - env!="prod"
    - severity="critical"
    continue: false
  - receiver: teams-warning-nonprod
    matchers:
    - env!="prod"
    - severity="warning"
    continue: false
  - receiver: teams-info-nonprod
    matchers:
    - env!="prod"
    - severity="info"
    continue: false
  - receiver: teams-warning-prod
    matchers:
    - env!="prod"
    continue: false
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 12h
inhibit_rules:
- source_match:
    severity: critical
  target_match_re:
    severity: warning|info
  equal:
  - namespace
  - alertname
- source_match:
    severity: warning
  target_match_re:
    severity: info
  equal:
  - namespace
  - alertname
receivers:
- name: default
- name: mail-critical
  email_configs:
  - send_resolved: false
    to: group.devops_system@example.com
    from: devops@example.com
    hello: localhost
    smarthost: smtp.example.com:25
    headers:
      From: devops@example.com
      Subject: '{{ template "email.default.subject" . }}'
      To: group.devops_system@example.com
    html: '{{ template "email.default.html" . }}'
    require_tls: false
- name: teams-critical-prod
  webhook_configs:
  - send_resolved: true
    http_config:
      follow_redirects: true
    url: http://localhost:8089/v2/critical
    max_alerts: 0
- name: teams-warning-prod
  webhook_configs:
  - send_resolved: true
    http_config:
      follow_redirects: true
    url: http://localhost:8089/v2/warning
    max_alerts: 0
- name: teams-info-prod
  webhook_configs:
  - send_resolved: true
    http_config:
      follow_redirects: true
    url: http://localhost:8089/v2/info
    max_alerts: 0
- name: teams-critical-nonprod
  webhook_configs:
  - send_resolved: true
    http_config:
      follow_redirects: true
    url: http://localhost:8090/v2/critical
    max_alerts: 0
- name: teams-warning-nonprod
  webhook_configs:
  - send_resolved: true
    http_config:
      follow_redirects: true
    url: http://localhost:8090/v2/warning
    max_alerts: 0
- name: teams-info-nonprod
  webhook_configs:
  - send_resolved: true
    http_config:
      follow_redirects: true
    url: http://localhost:8090/v2/info
    max_alerts: 0
- name: deadmanswitch
  webhook_configs:
  - send_resolved: false
    http_config:
      follow_redirects: true
    url: http://deadmanswitch:8080/ping/...
    max_alerts: 0
templates: []

More (advanced) options

For more insights of the configuration options, study the following resources:

Alerting rules in Prometheus

Prometheus alerting rules are configured very similarly to recording rules which you will get to know later in this training. The main difference is that the rules expression contains a threshold (e.g., query_expression >= 5) and that an alert is sent to the Alertmanager in case the rule evaluation matches the threshold. An alerting rule can be based on a recording rule or be a normal expression query.

Templates for awesome rules

Whenever creating PrometheusRules you can always expect other people having the same problem as you. On this site you can find a collection of different PrometheusRules for a big amount of cloud native technology.