Bug 1990384 - 502 error on "Observe -> Alerting" UI after disabled local alertmanager
Summary: 502 error on "Observe -> Alerting" UI after disabled local alertmanager
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.11.0
Assignee: Gabriel Bernal
QA Contact: Junqi Zhao
Depends On:
TreeView+ depends on / blocked
Reported: 2021-08-05 10:13 UTC by Junqi Zhao
Modified: 2022-08-10 10:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2022-08-10 10:36:53 UTC
Target Upstream Version:

Attachments (Terms of Use)
Alerts tab (218.21 KB, image/png)
2021-08-05 10:13 UTC, Junqi Zhao
no flags Details

System ID Private Priority Status Summary Last Updated
Github openshift console pull 11397 0 None open Bug 1990384: Improve Alertmanager unavailable message 2022-04-26 10:10:28 UTC
Github openshift console pull 9620 0 None open WIP: Gracefully handle disabling alertmanager 2022-02-10 07:46:10 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:37:07 UTC

Description Junqi Zhao 2021-08-05 10:13:39 UTC
Created attachment 1811175 [details]
Alerts tab

Description of problem:
Allow users to disable the local Alertmanager, lanuch cluster with the PR
# launch openshift/cluster-monitoring-operator#1293,4.9.0-0.nightly-2021-08-04-131508

disable the local Alertmanager
apiVersion: v1
kind: ConfigMap
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  config.yaml: |
      enabled: false
alertmanager pods are removed
# oc -n openshift-monitoring get pod
NAME                                           READY   STATUS    RESTARTS   AGE
cluster-monitoring-operator-77c76f4b48-bwl7l   2/2     Running   4          104m
grafana-84bc6bb567-4cgcc                       2/2     Running   0          85m
kube-state-metrics-7764969dd6-p2sst            3/3     Running   0          100m
node-exporter-2lqmt                            2/2     Running   0          100m
node-exporter-6pdvj                            2/2     Running   0          95m
node-exporter-7cpq9                            2/2     Running   0          95m
node-exporter-pq9m7                            2/2     Running   0          95m
node-exporter-qghmg                            2/2     Running   0          100m
node-exporter-xvhg8                            2/2     Running   0          100m
openshift-state-metrics-6df4c6875f-dgbtx       3/3     Running   0          100m
prometheus-adapter-7c4cf4d79b-b8n8m            1/1     Running   0          95m
prometheus-adapter-7c4cf4d79b-wzm59            1/1     Running   0          95m
prometheus-k8s-0                               7/7     Running   0          27m
prometheus-k8s-1                               7/7     Running   0          27m
prometheus-operator-cc649bf8f-hpd7r            2/2     Running   1          101m
telemeter-client-594c7b7d8d-slzx2              3/3     Running   0          100m
thanos-querier-548dcc9677-f77f6                5/5     Running   0          28m
thanos-querier-548dcc9677-qj8v8                5/5     Running   0          28m

login admin console, "Observe -> Alerting" console, Alerts tab shows 502 error due to get silence from api/alertmanager/api/v2/silences API
I think it is better not use alertmanager API
Error loading silences from Alertmanager. Some of the alerts below may actually be silenced.
Error: Bad Gateway
same for Silences tab

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. see the description

Actual results:

Expected results:

Additional info:

Comment 4 Junqi Zhao 2021-09-14 09:42:33 UTC
can not create silences from admin console, also would meet 502 Bad Gateway error

Comment 7 Mariusz Mazur 2021-10-08 13:01:48 UTC
The "Warning alert:Error loading silences from Alertmanager. Some of the alerts below may actually be silenced. r: Forbidden" message in Observe | Alerting is present on a vanilla OSD 4.9-rc5 that I just upgraded two of my test clusters to.

Comment 8 Simon Pasquier 2021-10-11 12:27:19 UTC
@Marius this is a bit different I think. It's a consequence of fixing https://bugzilla.redhat.com/show_bug.cgi?id=1947005: now you have to grant the monitoring-alertmanager-edit permission to any user that needs to manage silences.

Comment 13 Junqi Zhao 2022-05-07 02:29:09 UTC
tested with 4.11.0-0.nightly-2022-05-06-180112, followed the steps in Comment 0, error in silences page is now
Error loading silences from Alertmanager. Alertmanager may be unavailable.
Bad Gateway

Comment 19 errata-xmlrpc 2022-08-10 10:36:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.