1948090 – Storage should not set Available=False APIServices_Error AWSEBSCSIDriverOperatorCRAvailable on update

Bug 1948090 - Storage should not set Available=False APIServices_Error AWSEBSCSIDriverOperatorCRAvailable on update

Summary: Storage should not set Available=False APIServices_Error AWSEBSCSIDriverOpera...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Fabio Bertinatto
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-10 01:07 UTC by W. Trevor King
Modified:	2021-10-18 17:30 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-18 17:29:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift azure-disk-csi-driver-operator pull 27	None	None	None	2021-08-02 17:18:08 UTC
Github	openshift cluster-storage-operator pull 173	None	open	Bug 1948090: Remove unnecessary conditions when deploying CSI operator	2021-08-20 14:04:04 UTC
Github	openshift cluster-storage-operator pull 199	None	None	None	2021-08-03 20:22:14 UTC
Github	openshift csi-driver-manila-operator pull 112	None	None	None	2021-08-02 17:21:09 UTC
Github	openshift gcp-pd-csi-driver-operator pull 31	None	None	None	2021-08-03 20:57:20 UTC
Github	openshift openstack-cinder-csi-driver-operator pull 51	None	None	None	2021-08-02 17:19:18 UTC
Github	openshift openstack-cinder-csi-driver-operator pull 53	None	None	None	2021-08-03 20:49:40 UTC
Github	openshift ovirt-csi-driver-operator pull 66	None	None	None	2021-08-02 17:40:26 UTC
Github	openshift vmware-vsphere-csi-driver-operator pull 35	None	None	None	2021-08-02 17:18:40 UTC
Github	openshift vmware-vsphere-csi-driver-operator pull 38	None	None	None	2021-08-03 20:54:48 UTC
Red Hat Product Errata	RHSA-2021:3759	None	None	None	2021-10-18 17:30:13 UTC

Description W. Trevor King 2021-04-10 01:07:19 UTC

From CI runs like [1]:

  : [bz-Storage] clusteroperator/storage should not change condition/Available
    Run #0: Failed	0s
    1 unexpected clusteroperator state transitions during e2e test run 

    Apr 09 13:21:35.308 - 41s   E clusteroperator/storage condition/Available status/False reason/AWSEBSCSIDriverOperatorCRAvailable:
    AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment to deploy the CSI Controller Service

Very popular:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&name=^periodic.*upgrade&type=junit&search=clusteroperator/storage+should+not+change+condi
tion/Available' | grep 'failures match' | sort
periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-upgrade (all) - 17 runs, 100% failed, 88% of failures match = 88% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 19 runs, 100% failed, 95% of failures match = 95% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 17 runs, 100% failed, 94% of failures match = 94% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 21 runs, 100% failed, 76% of failures match = 76% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-upgrade (all) - 10 runs, 80% failed, 50% of failures match = 40% impact
periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 10 runs, 100% failed, 90% of failures match = 90% impact

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4831/pull-ci-openshift-installer-master-e2e-aws-upgrade/1380486185595441152

Comment 1 Fabio Bertinatto 2021-05-07 08:29:01 UTC

Apparently clusteroperator/storage changes the condition at 13:21:

Apr 09 13:21:35.308 - 41s   E clusteroperator/storage condition/Available status/False reason/AWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment to deploy the CSI Controller Service

And the new operator starts at 13:22:

I0409 13:22:00.161393       1 builder.go:240] aws-ebs-csi-driver-operator version v0.0.0-unknown-695b8fc

This means that the *previous* storage operator is going Available=False.

Comment 3 Fabio Bertinatto 2021-06-04 16:16:33 UTC

What's missing:

1. Review and merge PR https://github.com/openshift/cluster-storage-operator/pull/173
2. Backport the following PR to other CSI operators: https://github.com/openshift/aws-ebs-csi-driver-operator/pull/122/files

Comment 4 Scott Dodson 2021-07-14 18:04:29 UTC

This really should've been a 4.8.0 blocker but that intent was never conferred to assignees. I'm marking this as a blocker for 4.9.0 and would request that we backport this to 4.8 as soon as reasonable. We really need to get rid of negative signal that we generate during upgrades by operators going degraded during normal operations.

Comment 5 Fabio Bertinatto 2021-08-20 14:46:39 UTC

Moving manually to MODIFIED. oVirt is the only patch not merged yet, and it might be covered in other BZ.

Comment 7 Wei Duan 2021-09-03 01:08:36 UTC

Verified pass in recent ci in 4.9.

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&name=^periodic.*upgrade&type=junit&search=clusteroperator/storage+should+not+change+condition/Available' | grep 'failures match' | sort | grep 4.9 | wc -l
0

Comment 10 errata-xmlrpc 2021-10-18 17:29:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.