The test is failing across multiple gcp jobs: https://search.ci.openshift.org/?search=Multi-AZ+Cluster+Volumes+should+only+be+allowed+to+provision+PDs+in+zones+where+nodes+exist&maxAge=12h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job: ``` Jun 23 09:45:26.513: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound. Jun 23 09:45:28.551: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound. Jun 23 09:45:30.584: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound. Jun 23 09:45:32.619: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound. Jun 23 09:45:34.651: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound. Jun 23 09:45:36.689: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound. Jun 23 09:45:38.793: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound. Jun 23 09:45:40.837: INFO: PersistentVolumeClaim pvc-1 found but phase is Pending instead of Bound. Jun 23 09:45:42.838: INFO: deleting claim "e2e-multi-az-4704"/"pvc-4" Jun 23 09:45:42.895: INFO: deleting claim "e2e-multi-az-4704"/"pvc-3" Jun 23 09:45:42.948: INFO: deleting claim "e2e-multi-az-4704"/"pvc-2" Jun 23 09:45:42.992: INFO: deleting claim "e2e-multi-az-4704"/"pvc-1" Jun 23 09:45:43.037: INFO: Deleting compute resource: compute-f60cb1e9-b839-4293-af52-bbbb44595846 [AfterEach] [sig-storage] Multi-AZ Cluster Volumes k8s.io/kubernetes.1/test/e2e/framework/framework.go:186 STEP: Collecting events from namespace "e2e-multi-az-4704". STEP: Found 5 events. Jun 23 09:46:11.235: INFO: At 2021-06-23 09:40:30 +0000 UTC - event for e2e-multi-az-4704: {namespace-security-allocation-controller } CreatedSCCRanges: created SCC ranges Jun 23 09:46:11.235: INFO: At 2021-06-23 09:40:41 +0000 UTC - event for pvc-1: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding Jun 23 09:46:11.235: INFO: At 2021-06-23 09:40:41 +0000 UTC - event for pvc-2: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding Jun 23 09:46:11.235: INFO: At 2021-06-23 09:40:41 +0000 UTC - event for pvc-3: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding Jun 23 09:46:11.235: INFO: At 2021-06-23 09:40:41 +0000 UTC - event for pvc-4: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding Jun 23 09:46:11.272: INFO: POD NODE PHASE GRACE CONDITIONS Jun 23 09:46:11.272: INFO: Jun 23 09:46:11.387: INFO: skipping dumping cluster info - cluster too large STEP: Destroying namespace "e2e-multi-az-4704" for this suite. fail [k8s.io/kubernetes.1/test/e2e/storage/ubernetes_lite_volumes.go:163]: Unexpected error: <*errors.errorString | 0xc001cef930>: { s: "PersistentVolumeClaims [pvc-1] not all in phase Bound within 5m0s", } PersistentVolumeClaims [pvc-1] not all in phase Bound within 5m0s occurred ``` jobs affected: - https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-nightly-4.9-e2e-gcp-rt - https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-nightly-4.8-e2e-gcp-fips - https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#release-openshift-ocp-installer-e2e-gcp-ovn-4.8 - ... Reported previously against 4.2: https://bugzilla.redhat.com/show_bug.cgi?id=1738691 From https://bugzilla.redhat.com/show_bug.cgi?id=1738691#c1: ``` The failure from the description happened because the default StorageClass in the cluster has the volumeBindingMode option set to use WaitForFirstConsumer. However, this test has another problem: it needs to create an extra compute instance in a different zone [1], and we can't do that at the moment. I'll create a PR to disable it. ``` The test started to fail since Jun 21.
The test got reintroduced by https://github.com/openshift/origin/pull/26054/commits/c5fbd2f74d7959e93db5de6f5f640e2a5cf76735. Merged in May 9. Yet, it started to be ran 2 days ago. Not sure why.
The test was renamed in 1.21: https://github.com/kubernetes/kubernetes/commit/006dc7477f15e42ae70adc02421a5bacd068ba05 And therefore no longer matches the skip pattern previously used: https://github.com/openshift/kubernetes/blob/master/openshift-hack/e2e/annotate/rules.go#L148 So I guess the latter needs to be fixed then imported into origin to fix this?
Please also backport it to 4.8. Example of an affected job: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-nightly-4.8-e2e-gcp-rt
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759