Bug 2052201

Summary: Unable to update prometheus and alertmanager statefulsets
Product: OpenShift Container Platform Reporter: rvanderp
Component: MonitoringAssignee: Sunil Thaha <sthaha>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: high    
Version: 4.9CC: amuller, anpicker, aos-bugs, erooth, spasquie, stwalter, wking
Target Milestone: ---Keywords: Upgrades
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-18 13:20:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2030539    
Bug Blocks:    

Description rvanderp 2022-02-08 20:39:40 UTC
Description of problem:
During an upgrade from 4.8.29 to 4.9.19, the Prometheus operator failed to rollout updates to prometheus and alert manager.  As a result, prometheus became unavailable.

Version-Release number of selected component (if applicable):
4.9.19 IPI install on vSphere in VMC

How reproducible:
Unknown, this was encountered once on the vSphere build cluster.

Steps to Reproduce:
1. Upgrade from 4.8.29 to 4.9.19
2.
3.

Actual results:
Prometheus and associated cluster metrics and performance dashboards became unavailable

Expected results:
Prometheus operator should update statefulsets without intervention

Additional info:
Prometheus operator log reported the failures below and the statefulsets were stuck at Prometheus(0/2 available) Alert Manager(0/3 available).  The issue was remediated by deleting the statefulsets and letting the operator recreate them.

~~~
level=info ts=2022-02-08T17:19:39.289456243Z caller=operator.go:804 component=alertmanageroperator key=openshift-monitoring/main msg="recreating AlertManager StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:19:39.295713026Z caller=operator.go:742 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2022-02-08T17:19:39.330897841Z caller=operator.go:804 component=alertmanageroperator key=openshift-monitoring/main msg="recreating AlertManager StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:19:39.42114441Z caller=operator.go:1306 component=prometheusoperator key=openshift-monitoring/k8s statefulset=prometheus-k8s shard=0 msg="recreating StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:19:39.426608664Z caller=operator.go:1221 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2022-02-08T17:19:39.590102973Z caller=operator.go:1306 component=prometheusoperator key=openshift-monitoring/k8s statefulset=prometheus-k8s shard=0 msg="recreating StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:24:44.340355154Z caller=operator.go:1221 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"

Comment 2 Simon Pasquier 2022-02-09 09:45:27 UTC
From the operator logs, it's probably the same bug that has been reported in https://bugzilla.redhat.com/show_bug.cgi?id=2030539.

Comment 10 Junqi Zhao 2022-05-12 06:50:54 UTC
upgraded from 4.8.29 to 4.9.0-0.nightly-2022-05-11-100812, did not reproduce the issue, monitoring works well

Comment 12 errata-xmlrpc 2022-05-18 13:20:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.33 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2206