2052201 – Unable to update prometheus and alertmanager statefulsets

Bug 2052201 - Unable to update prometheus and alertmanager statefulsets

Summary: Unable to update prometheus and alertmanager statefulsets

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.9.z
Assignee:	Sunil Thaha
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:	2030539
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-08 20:39 UTC by rvanderp
Modified:	2022-05-18 13:20 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-18 13:20:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift prometheus-operator pull 167	None	open	Bug 2052201: Address race condition in recreate flow for statefulset	2022-04-27 04:34:28 UTC
Red Hat Knowledge Base (Solution)	6937581	None	None	None	2022-04-19 17:18:23 UTC
Red Hat Product Errata	RHBA-2022:2206	None	None	None	2022-05-18 13:20:57 UTC

Description rvanderp 2022-02-08 20:39:40 UTC

Description of problem:
During an upgrade from 4.8.29 to 4.9.19, the Prometheus operator failed to rollout updates to prometheus and alert manager.  As a result, prometheus became unavailable.

Version-Release number of selected component (if applicable):
4.9.19 IPI install on vSphere in VMC

How reproducible:
Unknown, this was encountered once on the vSphere build cluster.

Steps to Reproduce:
1. Upgrade from 4.8.29 to 4.9.19
2.
3.

Actual results:
Prometheus and associated cluster metrics and performance dashboards became unavailable

Expected results:
Prometheus operator should update statefulsets without intervention

Additional info:
Prometheus operator log reported the failures below and the statefulsets were stuck at Prometheus(0/2 available) Alert Manager(0/3 available).  The issue was remediated by deleting the statefulsets and letting the operator recreate them.

~~~
level=info ts=2022-02-08T17:19:39.289456243Z caller=operator.go:804 component=alertmanageroperator key=openshift-monitoring/main msg="recreating AlertManager StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:19:39.295713026Z caller=operator.go:742 component=alertmanageroperator key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2022-02-08T17:19:39.330897841Z caller=operator.go:804 component=alertmanageroperator key=openshift-monitoring/main msg="recreating AlertManager StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:19:39.42114441Z caller=operator.go:1306 component=prometheusoperator key=openshift-monitoring/k8s statefulset=prometheus-k8s shard=0 msg="recreating StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:19:39.426608664Z caller=operator.go:1221 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2022-02-08T17:19:39.590102973Z caller=operator.go:1306 component=prometheusoperator key=openshift-monitoring/k8s statefulset=prometheus-k8s shard=0 msg="recreating StatefulSet because the update operation wasn't possible" reason="Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy' and 'minReadySeconds' are forbidden"
level=info ts=2022-02-08T17:24:44.340355154Z caller=operator.go:1221 component=prometheusoperator key=openshift-monitoring/k8s msg="sync prometheus"

Comment 2 Simon Pasquier 2022-02-09 09:45:27 UTC

From the operator logs, it's probably the same bug that has been reported in https://bugzilla.redhat.com/show_bug.cgi?id=2030539.

Comment 10 Junqi Zhao 2022-05-12 06:50:54 UTC

upgraded from 4.8.29 to 4.9.0-0.nightly-2022-05-11-100812, did not reproduce the issue, monitoring works well

Comment 12 errata-xmlrpc 2022-05-18 13:20:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.33 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2206

Note You need to log in before you can comment on or make changes to this bug.