Bug 1961120 - CSI driver operators fail when upgrading a cluster
Summary: CSI driver operators fail when upgrading a cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: melbeher
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-17 10:00 UTC by Jan Safranek
Modified: 2021-07-27 23:08 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:08:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-storage-operator pull 167 0 None open Bug 1961120: added permissions to service monitoring 2021-05-18 11:32:12 UTC
Github openshift local-storage-operator pull 237 0 None open Bug 1961120: added permissions to service monitoring 2021-05-18 11:35:49 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:08:57 UTC

Description Jan Safranek 2021-05-17 10:00:00 UTC
Description of problem:
Cluster upgrade from 4.7 to 4.8-ish version failed with:

Operator degraded (AWSEBSCSIDriverOperatorCR_AWSEBSDriverServiceMonitorController_SyncError): AWSEBSCSIDriverOperatorCRDegraded: AWSEBSDriverServiceMonitorControllerDegraded: "servicemonitor.yaml" (string): servicemonitors.monitoring.coreos.com "aws-ebs-csi-driver-controller-monitor" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator" cannot update resource "servicemonitors" in API group "monitoring.coreos.com" in the namespace "openshift-cluster-csi-drivers"
AWSEBSCSIDriverOperatorCRDegraded: AWSEBSDriverServiceMonitorControllerDegraded: 

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1393205025966133248

The reason is that ServiceMonitor is being updated, not created, and the operator does not have permissions for it:

https://github.com/openshift/cluster-storage-operator/blob/195603230796c2a7189f6daf45cb58b1a4fb72a3/assets/csidriveroperators/aws-ebs/03_role.yaml#L34-L40

Please check all storage operators (CSI driver operators, CSI snapshot controller operator, vsphere-problem-detector) and ensure that they have permissions to update / patch / maybe delete ServiceMonitor.

Comment 1 melbeher 2021-05-17 16:23:06 UTC
CSI driver operators & vsphere-problem-detector has been fixed here https://github.com/openshift/cluster-storage-operator/pull/167

Comment 2 melbeher 2021-05-17 16:30:37 UTC
Local Storage Operator has been fixed here https://github.com/openshift/local-storage-operator/pull/237

Comment 4 Chao Yang 2021-05-24 08:05:51 UTC
Passed on aws when upgrade from 4.7.0-0.nightly-2021-05-20-112118 to 4.8.0-0.nightly-2021-05-21-233425
aws-ebs-csi-driver-operator and local-storage-operator should be passed.

Comment 5 Chao Yang 2021-05-25 01:44:26 UTC
passed for vsphere-problem-detector and snapshot

Comment 8 errata-xmlrpc 2021-07-27 23:08:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.