Bug 2052201 - Unable to update prometheus and alertmanager statefulsets
Summary: Unable to update prometheus and alertmanager statefulsets
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.9.z
Assignee: Sunil Thaha
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 2030539
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-08 20:39 UTC by rvanderp
Modified: 2022-05-18 13:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-18 13:20:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift prometheus-operator pull 167 0 None open Bug 2052201: Address race condition in recreate flow for statefulset 2022-04-27 04:34:28 UTC
Red Hat Knowledge Base (Solution) 6937581 0 None None None 2022-04-19 17:18:23 UTC
Red Hat Product Errata RHBA-2022:2206 0 None None None 2022-05-18 13:20:57 UTC

Description rvanderp 2022-02-08 20:39:40 UTC
Description of problem:
During an upgrade from 4.8.29 to 4.9.19, the Prometheus operator failed to rollout updates to prometheus and alert manager.  As a result, prometheus became unavailable.

Version-Release number of selected component (if applicable):
4.9.19 IPI install on vSphere in VMC

How reproducible:
Unknown, this was encountered once on the vSphere build cluster.

Steps to Reproduce:
1. Upgrade from 4.8.29 to 4.9.19
2.
3.

Actual results:
Prometheus and associated cluster metrics and performance dashboards became unavailable

Expected results:
Prometheus operator should update statefulsets without intervention

Additional info:
Prometheus operator log reported the failures below and the statefulsets were stuck at Prometheus(0/2 available) Alert Manager(0/3 available).  The issue was remediated by deleting the statefulsets and letting the operator recreate them.

~~~
level=info ts=2022-02-08T17:19:39.289456243Z caller=operator.go:804 component=alertmanageroperator key=openshift-monitoring/main msg="recreating AlertManager StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:19:39.295713026Z caller=operator.go:742 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2022-02-08T17:19:39.330897841Z caller=operator.go:804 component=alertmanageroperator key=openshift-monitoring/main msg="recreating AlertManager StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:19:39.42114441Z caller=operator.go:1306 component=prometheusoperator key=openshift-monitoring/k8s statefulset=prometheus-k8s shard=0 msg="recreating StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:19:39.426608664Z caller=operator.go:1221 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2022-02-08T17:19:39.590102973Z caller=operator.go:1306 component=prometheusoperator key=openshift-monitoring/k8s statefulset=prometheus-k8s shard=0 msg="recreating StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:24:44.340355154Z caller=operator.go:1221 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"

Comment 2 Simon Pasquier 2022-02-09 09:45:27 UTC
From the operator logs, it's probably the same bug that has been reported in https://bugzilla.redhat.com/show_bug.cgi?id=2030539.

Comment 10 Junqi Zhao 2022-05-12 06:50:54 UTC
upgraded from 4.8.29 to 4.9.0-0.nightly-2022-05-11-100812, did not reproduce the issue, monitoring works well

Comment 12 errata-xmlrpc 2022-05-18 13:20:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.33 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2206


Note You need to log in before you can comment on or make changes to this bug.