1821268 – Thanos Ruler should send alerts to all Alertmanager pods

Bug 1821268 - Thanos Ruler should send alerts to all Alertmanager pods

Summary: Thanos Ruler should send alerts to all Alertmanager pods

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	unspecified
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Simon Pasquier
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-06 12:57 UTC by Simon Pasquier
Modified:	2020-07-13 17:25 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-13 17:25:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	coreos prometheus-operator pull 3125	None	closed	pkg/alertmanager: fix definition of web service port	2020-09-17 09:38:13 UTC
Github	openshift cluster-monitoring-operator pull 745	None	closed	Bug 1821268: fix Alertmanager address for Thanos Ruler	2020-09-17 09:38:13 UTC
Red Hat Product Errata	RHBA-2020:2409	None	None	None	2020-07-13 17:25:53 UTC

Description Simon Pasquier 2020-04-06 12:57:39 UTC

Description of problem:
Thanos Ruler sends alerts to the Alertmanager service (alertmanager-main.openshift-monitoring.svc) instead of all Alertmanager pods.
This might that each Alertmanager will get an incomplete view of the alerts depending which .

Version-Release number of selected component (if applicable):
4.5

How reproducible:
Always

Steps to Reproduce:
1. Enable user workload monitoring.
2. Create a user alert that always fires.
***
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: test
namespace: default
spec:
groups:
- name: Test rules
rules:
- alert: Drill
expr: vector(1)
labels:
severity: warning
***

3.Query each Alertmanager for the list of active alerts

for i in {0,1,2}; do echo "alertmanager-main-$i"; oc exec -n openshift-monitoring -t alertmanager-main-$i -c alertmanager -- curl -s http://localhost:9093/api/v2/alerts | jq -r '.[].labels.alertname'; done

Actual results:

alertmanager-main-0
AlertmanagerReceiversNotConfigured
Watchdog
Drill
alertmanager-main-1
AlertmanagerReceiversNotConfigured
Watchdog
Drill
alertmanager-main-2
AlertmanagerReceiversNotConfigured
Watchdog

Expected results:
The "Drill" alert should be present for every Alertmanager.

Additional info:
Right now Thanos Ruler is configured with to with "alertmanager-main.openshift-monitoring.svc:9094". It needs to be "dnssrv+_web._tcp.alertmanager-operated.openshift-monitoring.svc" instead. But for this to work, a first fix is needed in prometheus-operator.

Comment 3 Junqi Zhao 2020-04-22 03:35:12 UTC

Tested with 4.5.0-0.nightly-2020-04-21-233210, followed the steps in Comment 0, The "Drill" alert is present for every Alertmanager
# for i in {0,1,2}; do echo "alertmanager-main-$i"; oc exec -n openshift-monitoring -t  alertmanager-main-$i -c alertmanager --  curl -s http://localhost:9093/api/v2/alerts | jq -r '.[].labels.alertname'; echo -e "\n"; done
alertmanager-main-0
AlertmanagerReceiversNotConfigured
Drill
CustomResourceDetected
KubePodCrashLooping
KubeDeploymentReplicasMismatch
Watchdog


alertmanager-main-1
AlertmanagerReceiversNotConfigured
Drill
CustomResourceDetected
KubePodCrashLooping
KubeDeploymentReplicasMismatch
Watchdog


alertmanager-main-2
AlertmanagerReceiversNotConfigured
Drill
CustomResourceDetected
KubePodCrashLooping
KubeDeploymentReplicasMismatch
Watchdog

Comment 4 errata-xmlrpc 2020-07-13 17:25:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.