Bug 1990384

Summary: 502 error on "Observe -> Alerting" UI after disabled local alertmanager
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Gabriel Bernal <gbernal>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.9CC: amuller, anpicker, mmazur, spasquie
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:36:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Alerts tab none

Description Junqi Zhao 2021-08-05 10:13:39 UTC
Created attachment 1811175 [details]
Alerts tab

Description of problem:
https://github.com/openshift/cluster-monitoring-operator/pull/1293
Allow users to disable the local Alertmanager, lanuch cluster with the PR
# launch openshift/cluster-monitoring-operator#1293,4.9.0-0.nightly-2021-08-04-131508

disable the local Alertmanager
**********************
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      enabled: false
**********************
alertmanager pods are removed
# oc -n openshift-monitoring get pod
NAME                                           READY   STATUS    RESTARTS   AGE
cluster-monitoring-operator-77c76f4b48-bwl7l   2/2     Running   4          104m
grafana-84bc6bb567-4cgcc                       2/2     Running   0          85m
kube-state-metrics-7764969dd6-p2sst            3/3     Running   0          100m
node-exporter-2lqmt                            2/2     Running   0          100m
node-exporter-6pdvj                            2/2     Running   0          95m
node-exporter-7cpq9                            2/2     Running   0          95m
node-exporter-pq9m7                            2/2     Running   0          95m
node-exporter-qghmg                            2/2     Running   0          100m
node-exporter-xvhg8                            2/2     Running   0          100m
openshift-state-metrics-6df4c6875f-dgbtx       3/3     Running   0          100m
prometheus-adapter-7c4cf4d79b-b8n8m            1/1     Running   0          95m
prometheus-adapter-7c4cf4d79b-wzm59            1/1     Running   0          95m
prometheus-k8s-0                               7/7     Running   0          27m
prometheus-k8s-1                               7/7     Running   0          27m
prometheus-operator-cc649bf8f-hpd7r            2/2     Running   1          101m
telemeter-client-594c7b7d8d-slzx2              3/3     Running   0          100m
thanos-querier-548dcc9677-f77f6                5/5     Running   0          28m
thanos-querier-548dcc9677-qj8v8                5/5     Running   0          28m


login admin console, "Observe -> Alerting" console, Alerts tab shows 502 error due to get silence from api/alertmanager/api/v2/silences API
I think it is better not use alertmanager API
**********************
Error loading silences from Alertmanager. Some of the alerts below may actually be silenced.
Error: Bad Gateway
**********************
same for Silences tab

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Junqi Zhao 2021-09-14 09:42:33 UTC
can not create silences from admin console, also would meet 502 Bad Gateway error

Comment 7 Mariusz Mazur 2021-10-08 13:01:48 UTC
The "Warning alert:Error loading silences from Alertmanager. Some of the alerts below may actually be silenced. r: Forbidden" message in Observe | Alerting is present on a vanilla OSD 4.9-rc5 that I just upgraded two of my test clusters to.

Comment 8 Simon Pasquier 2021-10-11 12:27:19 UTC
@Marius this is a bit different I think. It's a consequence of fixing https://bugzilla.redhat.com/show_bug.cgi?id=1947005: now you have to grant the monitoring-alertmanager-edit permission to any user that needs to manage silences.

Comment 13 Junqi Zhao 2022-05-07 02:29:09 UTC
tested with 4.11.0-0.nightly-2022-05-06-180112, followed the steps in Comment 0, error in silences page is now
Error loading silences from Alertmanager. Alertmanager may be unavailable.
Bad Gateway

Comment 19 errata-xmlrpc 2022-08-10 10:36:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069