From CI runs like [1]: : [bz-Storage] clusteroperator/storage should not change condition/Available Run #0: Failed 0s 1 unexpected clusteroperator state transitions during e2e test run Apr 09 13:21:35.308 - 41s E clusteroperator/storage condition/Available status/False reason/AWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment to deploy the CSI Controller Service Very popular: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&name=^periodic.*upgrade&type=junit&search=clusteroperator/storage+should+not+change+condi tion/Available' | grep 'failures match' | sort periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-upgrade (all) - 17 runs, 100% failed, 88% of failures match = 88% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 19 runs, 100% failed, 95% of failures match = 95% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 17 runs, 100% failed, 94% of failures match = 94% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 21 runs, 100% failed, 76% of failures match = 76% impact periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-upgrade (all) - 10 runs, 80% failed, 50% of failures match = 40% impact periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 10 runs, 100% failed, 90% of failures match = 90% impact [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4831/pull-ci-openshift-installer-master-e2e-aws-upgrade/1380486185595441152
Apparently clusteroperator/storage changes the condition at 13:21: Apr 09 13:21:35.308 - 41s E clusteroperator/storage condition/Available status/False reason/AWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment to deploy the CSI Controller Service And the new operator starts at 13:22: I0409 13:22:00.161393 1 builder.go:240] aws-ebs-csi-driver-operator version v0.0.0-unknown-695b8fc This means that the *previous* storage operator is going Available=False.
What's missing: 1. Review and merge PR https://github.com/openshift/cluster-storage-operator/pull/173 2. Backport the following PR to other CSI operators: https://github.com/openshift/aws-ebs-csi-driver-operator/pull/122/files
This really should've been a 4.8.0 blocker but that intent was never conferred to assignees. I'm marking this as a blocker for 4.9.0 and would request that we backport this to 4.8 as soon as reasonable. We really need to get rid of negative signal that we generate during upgrades by operators going degraded during normal operations.
Moving manually to MODIFIED. oVirt is the only patch not merged yet, and it might be covered in other BZ.
Verified pass in recent ci in 4.9. $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&name=^periodic.*upgrade&type=junit&search=clusteroperator/storage+should+not+change+condition/Available' | grep 'failures match' | sort | grep 4.9 | wc -l 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759