https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-compact-serial/1372797703225872384 serial runs create and remove nodes gracefully. Operators may not go unavailable when that happens: clusteroperator/csi-snapshot-controller should not change condition/Available csi-snapshot-controller was Available=true, but became Available=false at 2021-03-19 07:11:43.004517398 +0000 UTC -- CSISnapshotWebhookControllerAvailable: Waiting for a validating webhook Deployment pod to start csi-snapshot-controller was Available=false, but became Available=true at 2021-03-19 07:11:43.016807852 +0000 UTC -- All is well csi-snapshot-controller was Available=true, but became Available=false at 2021-03-19 07:11:43.132808339 +0000 UTC -- CSISnapshotWebhookControllerAvailable: Waiting for a validating webhook Deployment pod to start csi-snapshot-controller was Available=false, but became Available=true at 2021-03-19 07:12:00.399030955 +0000 UTC -- All is well csi-snapshot-controller was Available=true, but became Available=false at 2021-03-19 07:12:00.509765359 +0000 UTC -- CSISnapshotWebhookControllerAvailable: Waiting for a validating webhook Deployment pod to start csi-snapshot-controller was Available=false, but became Available=true at 2021-03-19 07:12:00.541133107 +0000 UTC -- All is well csi-snapshot-controller was Available=true, but became Available=false at 2021-03-19 07:12:00.575805876 +0000 UTC -- CSISnapshotWebhookControllerAvailable: Waiting for a validating webhook Deployment pod to start csi-snapshot-controller was Available=false, but became Available=true at 2021-03-19 07:12:00.607094 +0000 UTC -- All is well csi-snapshot-controller was Available=true, but became Available=false at 2021-03-19 07:12:00.641166131 +0000 UTC -- CSISnapshotWebhookControllerAvailable: Waiting for a validating webhook Deployment pod to start csi-snapshot-controller was Available=false, but became Available=true at 2021-03-19 07:12:00.668652789 +0000 UTC -- All is well Adding and removing nodes does not make the operator unavailable, unless the operator has not properly configured itself so that graceful movement of the webhook pod is zero-disruption (which is a bug). Operator should ensure that during graceful machine shutdown (drain etc) that all its components remain available by choosing the appropriate configuration for dependencies. High because this is a normal behavior of the platform and the operator violates the constraints.
Pretty much all the update jobs too: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&name=^periodic.*upgrade&type=junit&search=clusteroperator/csi-snapshot-controller+should+ not+change+condition/Available' | grep 'failures match' | sort periodic-ci-openshift-release-master-ci-4.8-e2e-aws-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-upgrade (all) - 16 runs, 100% failed, 69% of failures match = 69% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-ovn-upgrade (all) - 19 runs, 100% failed, 89% of failures match = 89% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 17 runs, 100% failed, 88% of failures match = 88% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-ovn-upgrade (all) - 4 runs, 100% failed, 75% of failures match = 75% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-azure-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade (all) - 4 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-ovirt-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 21 runs, 100% failed, 71% of failures match = 71% impact periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-upgrade (all) - 10 runs, 80% failed, 50% of failures match = 40% impact periodic-ci-openshift-release-master-nightly-4.8-e2e-metal-ipi-upgrade (all) - 10 runs, 50% failed, 60% of failures match = 30% impact periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-aws-upgrade (all) - 10 runs, 100% failed, 70% of failures match = 70% impact periodic-ci-openshift-release-master-nightly-4.8-upgrade-from-stable-4.7-e2e-metal-ipi-upgrade (all) - 10 runs, 100% failed, 90% of failures match = 90% impact The test-case is new in 4.8, which is at least part of why earlier versions don't show up in that query.
Moving back to assigned because I still see some failures.
I just verified that last failing job run: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade/1400795131786825728 The job is not using a bundle with the updated operator. Perhaps it'll take a while until the changes are propagated. Moving again to QA.
Did not see the failure on 4.8 non-single-node ci. Marked as verified according to the discussion, and agree with @wking that maybe need take some action for the single-node case.
Created bug 1973686(In reply to Wei Duan from comment #10) > Did not see the failure on 4.8 non-single-node ci. > Marked as verified according to the discussion, and agree with @wking that > maybe need take some action for the single-node case. Created bug 1973686 to address that.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438