Skip to content

Commit

Permalink
Add alertmanager container for sending alerts and alert rule
Browse files Browse the repository at this point in the history
  • Loading branch information
aequitas committed Aug 29, 2024
1 parent 45f8433 commit 4aa062a
Show file tree
Hide file tree
Showing 7 changed files with 150 additions and 4 deletions.
1 change: 1 addition & 0 deletions docker/batch-test.env
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ IPV4_IP_GRAFANA_INTERNAL=192.168.43.110
IPV4_IP_PROMETHEUS_INTERNAL=192.168.43.111
IPV4_IP_RESOLVER_INTERNAL_VALIDATING=192.168.43.112
IPV4_IP_RESOLVER_INTERNAL_PERMISSIVE=192.168.43.113
IPV4_IP_ALERTMANAGER_INTERNAL=192.168.43.114

IPV4_IP_MOCK_RESOLVER_PUBLIC=172.43.0.114
IPV6_IP_MOCK_RESOLVER_PUBLIC=fd00:43:1::114
Expand Down
17 changes: 17 additions & 0 deletions docker/defaults.env
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,22 @@ INTERNET_NL_CHECK_SUPPORT_RPKI=True
# list of domainnames that can have retry timer be reset via API
INTERNETNL_CACHE_RESET_ALLOWLIST=

# settings for alertmanager, enable it by adding 'alertmanager' to COMPOSE_PROFILES
# sending email address used for alerts
ALERTMANAGER_MAIL_FROM=

# SMTP configuration for sending emails
ALERTMANAGER_SMTP_HOST=
ALERTMANAGER_SMTP_USER=
ALERTMANAGER_SMTP_PASSWORD=
ALERTMANAGER_SMTP_PORT=587

# comma separated list of email addresses to send alert emails to
ALERTMANAGER_MAIL_TO=

# set subject for alert mails to be sent, see: https://prometheus.io/docs/alerting/latest/notifications/
ALERTMANAGER_SUBJECT=Alert on host '{{ .CommonAnnotations.host }}', caused by '{{ .CommonAnnotations.summary }}'

## Settings below _may_ be changed but are best _left_ as is

# Docker Compose project name to use in case of multiple instances running on the same host
Expand Down Expand Up @@ -209,6 +225,7 @@ IPV4_IP_GRAFANA_INTERNAL=192.168.42.110
IPV4_IP_PROMETHEUS_INTERNAL=192.168.42.111
IPV4_IP_RESOLVER_INTERNAL_VALIDATING=192.168.42.112
IPV4_IP_RESOLVER_INTERNAL_PERMISSIVE=192.168.42.113
IPV4_IP_ALERTMANAGER_INTERNAL=192.168.42.114

IPV4_IP_MOCK_RESOLVER_PUBLIC=172.42.0.114
IPV6_IP_MOCK_RESOLVER_PUBLIC=fd00:42:1::114
Expand Down
65 changes: 65 additions & 0 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ services:
- IPV4_IP_APP_INTERNAL
- IPV4_IP_GRAFANA_INTERNAL
- IPV4_IP_PROMETHEUS_INTERNAL
- IPV4_IP_ALERTMANAGER_INTERNAL
- ENABLE_BATCH
- LETSENCRYPT_STAGING
- LETSENCRYPT_EMAIL
Expand Down Expand Up @@ -790,6 +791,8 @@ services:
configs:
- source: prometheus_config
target: /prometheus.yaml
- source: prometheus_rules_config
target: /prometheus-rules.yaml

restart: unless-stopped
logging:
Expand All @@ -806,6 +809,31 @@ services:
volumes:
- prometheus-data:/prometheus

alertmanager:
image: ${DOCKER_IMAGE_PROMETHEUS:-prom/alertmanager:v0.27.0}

command:
- --config.file=/alertmanager.yaml
- --web.external-url=https://$INTERNETNL_DOMAINNAME/alertmanager/
- --cluster.listen-address=

configs:
- source: alertmanager_config
target: /alertmanager.yaml

restart: unless-stopped
logging:
driver: $LOGGING_DRIVER
options:
tag: '{{.Name}}'
networks:
internal:
ipv4_address: $IPV4_IP_ALERTMANAGER_INTERNAL
public-internet: {}

profiles:
- alertmanager

postgresql-exporter:
image: ${DOCKER_IMAGE_POSTGRESQL_EXPORTER:-prometheuscommunity/postgres-exporter:v0.12.0}

Expand Down Expand Up @@ -999,6 +1027,14 @@ configs:
global:
scrape_interval: 10s
scrape_timeout: 5s
rule_files:
- /prometheus-rules.yaml
alerting:
alertmanagers:
- path_prefix: /alertmanager
static_configs:
- targets:
- $IPV4_IP_ALERTMANAGER_INTERNAL:9093
scrape_configs:
- &scrape_config
scheme: http
Expand Down Expand Up @@ -1031,6 +1067,35 @@ configs:
- <<: *scrape_config
job_name: nginx_logs_exporter
static_configs: [{targets: ["nginx_logs_exporter:4040"]}]
prometheus_rules_config:
content: |
groups:
- name: End to end monitoring
rules:
- alert: HighTestRuntime
expr: min(tests_test_runtime_seconds{test="site"})>=10 and max(tests_test_runtime_seconds{test="site"})>=30
annotations:
host: $INTERNETNL_DOMAINNAME
summary: Tests/probes take longer to complete than expected
dashboard: 'https://$INTERNETNL_DOMAINNAME/grafana/d/af7d1d82-c0f9-4d8d-bc03-542c4c4c75c0/periodic-tests'
alertmanager_config:
content: |
global:
smtp_from: $ALERTMANAGER_MAIL_FROM
smtp_smarthost: $ALERTMANAGER_SMTP_HOST:$ALERTMANAGER_SMTP_PORT
smtp_require_tls: true
smtp_auth_username: $ALERTMANAGER_SMTP_USER
smtp_auth_password: $ALERTMANAGER_SMTP_PASSWORD
route:
receiver: alerts
routes:
- receiver: alerts
receivers:
- name: alerts
email_configs:
- to: $ALERTMANAGER_MAIL_TO
headers:
subject: $ALERTMANAGER_SUBJECT
restart_worker_cron:
content: |
Expand Down
40 changes: 36 additions & 4 deletions docker/grafana/dashboards/home.json
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"uid": "oUXCLhCMk"
},
"gridPos": {
"h": 15,
"h": 19,
"w": 12,
"x": 0,
"y": 1
Expand Down Expand Up @@ -94,7 +94,7 @@
"uid": "oUXCLhCMk"
},
"gridPos": {
"h": 15,
"h": 8,
"w": 12,
"x": 12,
"y": 1
Expand All @@ -106,7 +106,7 @@
"showLineNumbers": false,
"showMiniMap": false
},
"content": "<ul>\n<li><a target=\"_blank\" href=\"/prometheus\">/prometheus</a>\n<li><a target=\"_blank\" href=\"/prometheus/targets\">/prometheus/targets</a>\n",
"content": "<ul>\n<li><a target=\"_blank\" href=\"/prometheus\">/prometheus</a>\n<li><a target=\"_blank\" href=\"/prometheus/targets\">/prometheus/targets</a>\n<li><a target=\"_blank\" href=\"/prometheus/alerts\">/prometheus/alerts</a>\n<li><a target=\"_blank\" href=\"/prometheus/rules\">/prometheus/rules</a>\n<li><a target=\"_blank\" href=\"/alertmanager\">/alertmanager</a>\n",
"mode": "html"
},
"pluginVersion": "9.5.2",
Expand All @@ -121,6 +121,38 @@
],
"title": "Links",
"type": "text"
},
{
"datasource": {
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"gridPos": {
"h": 11,
"w": 12,
"x": 12,
"y": 9
},
"id": 10,
"options": {
"alertInstanceLabelFilter": "",
"alertName": "",
"dashboardAlerts": false,
"groupBy": [],
"groupMode": "default",
"maxItems": 20,
"sortOrder": 1,
"stateFilter": {
"error": true,
"firing": true,
"noData": false,
"normal": false,
"pending": true
},
"viewMode": "list"
},
"title": "Panel Title",
"type": "alertlist"
}
],
"refresh": "30s",
Expand Down Expand Up @@ -168,4 +200,4 @@
"uid": "NES71yrGz",
"version": 15,
"weekStart": ""
}
}
1 change: 1 addition & 0 deletions docker/test.env
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ IPV4_IP_GRAFANA_INTERNAL=192.168.43.110
IPV4_IP_PROMETHEUS_INTERNAL=192.168.43.111
IPV4_IP_RESOLVER_INTERNAL_VALIDATING=192.168.43.112
IPV4_IP_RESOLVER_INTERNAL_PERMISSIVE=192.168.43.113
IPV4_IP_ALERTMANAGER_INTERNAL=192.168.43.114

IPV4_IP_MOCK_RESOLVER_PUBLIC=172.43.0.114
IPV6_IP_MOCK_RESOLVER_PUBLIC=fd00:43:1::114
Expand Down
5 changes: 5 additions & 0 deletions docker/webserver/nginx_templates/app.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,11 @@ server {
auth_basic_user_file /etc/nginx/htpasswd/monitoring.htpasswd;
proxy_pass http://${IPV4_IP_PROMETHEUS_INTERNAL}:9090;
}
location /alertmanager {
auth_basic "Please enter your monitoring username and password";
auth_basic_user_file /etc/nginx/htpasswd/monitoring.htpasswd;
proxy_pass http://${IPV4_IP_ALERTMANAGER_INTERNAL}:9093;
}
}

# Temporary (1 year) exception for conn. subdomain to disable HSTS and redirect back to HTTP for
Expand Down
25 changes: 25 additions & 0 deletions documentation/Docker-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,31 @@ To verify the health status of the critial services use these commands:

The services `webserver`, `app`, `postgres` and `redis` are critical for the user facing HTTP frontend, no page will show if these are not running. The services `worker`, `rabbitmq`, `routinator`, `unbound`, `resolver-permissive` and `resolver-validating` are additionally required for new tests to be performed. The `beat` service is required for updating hall-of-fame. For Batch Deployment this is however a critical service to schedule batch tests submitted via the API.

### Alerting emails/alertmanager

A Prometheus Alertmanager service is available but disabled by default. Enabling this will allow you to configure alert emails to be sent whenever the periodic tests fail to complete in a reasonable time, indicating an issue with the application.

To enable and configure the Alertmanager add the following lines to `docker/local.env` and adjust the values to be applicable for your environment:

COMPOSE_PROFILES=default,alertmanager
[email protected],[email protected]
[email protected]
ALERTMANAGER_SMTP_HOST=smtp.example.com
ALERTMANAGER_SMTP_USER=example
ALERTMANAGER_SMTP_PASSWORD=example

If there already is a `COMPOSE_PROFILES` entry in the configuration file, add `alertmanager` to that instead.

The SMTP server is expected to use TLS, there is no way to disable this setting. The port used is `587` and can be customized using the `ALERTMANAGER_SMTP_PORT` variable.

The email subject can be customized using the `ALERTMANAGER_SUBJECT` variable, see `docker/defaults.env` for details.

Current alert status can seen at: https://example.com/prometheus/alerts or https://example.com/alertmanager

If notification emails are not being sent even though alert status shows red see Alertmanager logging for debugging:

docker compose --project-name=internetnl-prod logs --follow alertmanager

## Restricting access

By default the installation is open to everyone. If you like to restrict access you can do so by either using HTTP Basic Authentication or IP allow/deny lists.
Expand Down

0 comments on commit 4aa062a

Please sign in to comment.