Bug 1948090 - Storage should not set Available=False APIServices_Error AWSEBSCSIDriverOperatorCRAvailable on update
Summary: Storage should not set Available=False APIServices_Error AWSEBSCSIDriverOpera...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Fabio Bertinatto
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-10 01:07 UTC by W. Trevor King
Modified: 2021-10-18 17:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:29:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift azure-disk-csi-driver-operator pull 27 0 None None None 2021-08-02 17:18:08 UTC
Github openshift cluster-storage-operator pull 173 0 None open Bug 1948090: Remove unnecessary conditions when deploying CSI operator 2021-08-20 14:04:04 UTC
Github openshift cluster-storage-operator pull 199 0 None None None 2021-08-03 20:22:14 UTC
Github openshift csi-driver-manila-operator pull 112 0 None None None 2021-08-02 17:21:09 UTC
Github openshift gcp-pd-csi-driver-operator pull 31 0 None None None 2021-08-03 20:57:20 UTC
Github openshift openstack-cinder-csi-driver-operator pull 51 0 None None None 2021-08-02 17:19:18 UTC
Github openshift openstack-cinder-csi-driver-operator pull 53 0 None None None 2021-08-03 20:49:40 UTC
Github openshift ovirt-csi-driver-operator pull 66 0 None None None 2021-08-02 17:40:26 UTC
Github openshift vmware-vsphere-csi-driver-operator pull 35 0 None None None 2021-08-02 17:18:40 UTC
Github openshift vmware-vsphere-csi-driver-operator pull 38 0 None None None 2021-08-03 20:54:48 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:30:13 UTC

Description W. Trevor King 2021-04-10 01:07:19 UTC
From CI runs like [1]:

  : [bz-Storage] clusteroperator/storage should not change condition/Available
    Run #0: Failed	0s
    1 unexpected clusteroperator state transitions during e2e test run 

    Apr 09 13:21:35.308 - 41s   E clusteroperator/storage condition/Available status/False reason/AWSEBSCSIDriverOperatorCRAvailable:
    AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment to deploy the CSI Controller Service

Very popular:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&name=^periodic.*upgrade&type=junit&search=clusteroperator/storage+should+not+change+condi
tion/Available' | grep 'failures match' | sort
periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-upgrade (all) - 17 runs, 100% failed, 88% of failures match = 88% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 19 runs, 100% failed, 95% of failures match = 95% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 17 runs, 100% failed, 94% of failures match = 94% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 21 runs, 100% failed, 76% of failures match = 76% impact
periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-upgrade (all) - 10 runs, 80% failed, 50% of failures match = 40% impact
periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 10 runs, 100% failed, 90% of failures match = 90% impact

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4831/pull-ci-openshift-installer-master-e2e-aws-upgrade/1380486185595441152

Comment 1 Fabio Bertinatto 2021-05-07 08:29:01 UTC
Apparently clusteroperator/storage changes the condition at 13:21:

Apr 09 13:21:35.308 - 41s   E clusteroperator/storage condition/Available status/False reason/AWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment to deploy the CSI Controller Service

And the new operator starts at 13:22:

I0409 13:22:00.161393       1 builder.go:240] aws-ebs-csi-driver-operator version v0.0.0-unknown-695b8fc

This means that the *previous* storage operator is going Available=False.

Comment 3 Fabio Bertinatto 2021-06-04 16:16:33 UTC
What's missing:

1. Review and merge PR https://github.com/openshift/cluster-storage-operator/pull/173
2. Backport the following PR to other CSI operators: https://github.com/openshift/aws-ebs-csi-driver-operator/pull/122/files

Comment 4 Scott Dodson 2021-07-14 18:04:29 UTC
This really should've been a 4.8.0 blocker but that intent was never conferred to assignees. I'm marking this as a blocker for 4.9.0 and would request that we backport this to 4.8 as soon as reasonable. We really need to get rid of negative signal that we generate during upgrades by operators going degraded during normal operations.

Comment 5 Fabio Bertinatto 2021-08-20 14:46:39 UTC
Moving manually to MODIFIED. oVirt is the only patch not merged yet, and it might be covered in other BZ.

Comment 7 Wei Duan 2021-09-03 01:08:36 UTC
Verified pass in recent ci in 4.9.

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&name=^periodic.*upgrade&type=junit&search=clusteroperator/storage+should+not+change+condition/Available' | grep 'failures match' | sort | grep 4.9 | wc -l
0

Comment 10 errata-xmlrpc 2021-10-18 17:29:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.