I see some tests in test run https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.5/1242 failing with error: May 21 12:18:33.682: INFO: Warning: Making PVC: VolumeMode specified as invalid empty string, treating as nil May 21 12:18:33.738: INFO: Waiting up to 5m0s for PersistentVolumeClaims [azure-disklgqmm] to have phase Bound May 21 12:18:33.789: INFO: PersistentVolumeClaim azure-disklgqmm found but phase is Pending instead of Bound. May 21 12:18:35.830: INFO: PersistentVolumeClaim azure-disklgqmm found but phase is Pending instead of Bound. May 21 12:18:37.872: INFO: PersistentVolumeClaim azure-disklgqmm found but phase is Pending instead of Bound. May 21 12:18:39.916: INFO: PersistentVolumeClaim azure-disklgqmm found and phase=Bound (6.177903713s) STEP: starting azure-injector STEP: Deleting pod azure-injector in namespace e2e-volume-4484 May 21 12:23:40.240: INFO: Waiting for pod azure-injector to disappear May 21 12:23:40.279: INFO: Pod azure-injector no longer exists May 21 12:23:40.279: FAIL: Failed to create injector pod: timed out waiting for the condition Full Stack Trace github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/volume.InjectContent(0xc00145fb80, 0xc00166e080, 0xf, 0x59d0dd4, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/volume/fixtures.go:518 +0x973 github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/storage/testsuites.(*volumesTestSuite).DefineTests.func3() /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/storage/testsuites/volumes.go:183 +0x405 github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc001b66570, 0xc00171a710, 0x1, 0x1, 0x0, 0x23d3400) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:59 +0x41f main.newRunTestCommand.func1.1() /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:239 +0x4e github.com/openshift/origin/test/extended/util.WithCleanup(0xc001df1bd8) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:166 +0x58 main.newRunTestCommand.func1(0xc001600a00, 0xc00171a710, 0x1, 0x1, 0x0, 0x0) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:239 +0x1be github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc001600a00, 0xc00171a6d0, 0x1, 0x1, 0xc001600a00, 0xc00171a6d0) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:826 +0x460 github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc001600280, 0x0, 0x6687640, 0xa1180f8) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:914 +0x2fb github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(...) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:864 main.main.func1(0xc001600280, 0x0, 0x0) /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:61 +0x9c main.main() /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:62 +0x36e STEP: cleaning the environment after azure STEP: Deleting pvc May 21 12:23:40.280: INFO: Deleting PersistentVolumeClaim "azure-disklgqmm" May 21 12:23:40.334: INFO: Waiting up to 5m0s for PersistentVolume pvc-c90588b4-4d55-4089-acf8-2767446e236e to get deleted May 21 12:23:40.374: INFO: PersistentVolume pvc-c90588b4-4d55-4089-acf8-2767446e236e found and phase=Released (39.773857ms) May 21 12:23:45.416: INFO: PersistentVolume pvc-c90588b4-4d55-4089-acf8-2767446e236e found and phase=Released (5.081553213s) May 21 12:23:50.464: INFO: PersistentVolume pvc-c90588b4-4d55-4089-acf8-2767446e236e was removed STEP: Deleting sc May 21 12:23:50.528: INFO: In-tree plugin kubernetes.io/azure-disk is not migrated, not validating any metrics [AfterEach] [Testpattern: Dynamic PV (ext3)] volumes /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:179 STEP: Collecting events from namespace "e2e-volume-4484". STEP: Found 3 events. May 21 12:23:50.580: INFO: At 2020-05-21 12:18:39 +0000 UTC - event for azure-disklgqmm: {persistentvolume-controller } ProvisioningSucceeded: Successfully provisioned volume pvc-c90588b4-4d55-4089-acf8-2767446e236e using kubernetes.io/azure-disk May 21 12:23:50.580: INFO: At 2020-05-21 12:18:40 +0000 UTC - event for azure-injector: {default-scheduler } FailedScheduling: 0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. May 21 12:23:50.580: INFO: At 2020-05-21 12:23:40 +0000 UTC - event for azure-injector: {default-scheduler } FailedScheduling: skip schedule deleting pod: e2e-volume-4484/azure-injector May 21 12:23:50.622: INFO: POD NODE PHASE GRACE CONDITIONS May 21 12:23:50.622: INFO: May 21 12:23:50.742: INFO: skipping dumping cluster info - cluster too large May 21 12:23:50.742: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-volume-4484" for this suite. May 21 12:23:50.869: INFO: Running AfterSuite actions on all nodes May 21 12:23:50.869: INFO: Running AfterSuite actions on node 1 fail [k8s.io/kubernetes/test/e2e/framework/volume/fixtures.go:518]: May 21 12:23:40.279: Failed to create injector pod: timed out waiting for the condition
These tests are I think failing because volume is being provisioned in a zone where there is no worker node. This is because cluster has 3 master nodes and 2 worker nodes. I think we had similar problem in AWS for abit and it caused flakes.
The internal tests Azure tests have passed with this change. I've submitted an upstream PR [1] to include this in k8s. [1] https://github.com/kubernetes/kubernetes/pull/91642
Still find some cases failed with volume node affinity conflict, need to check if there is other issue. https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.5/1284319209928527872 STEP: Found 3 events. Jul 18 03:54:08.144: INFO: At 2020-07-18 03:48:55 +0000 UTC - event for azure-diskph482: {persistentvolume-controller } ProvisioningSucceeded: Successfully provisioned volume pvc-9521d714-f364-48a1-baed-95a40db5b33f using kubernetes.io/azure-disk Jul 18 03:54:08.144: INFO: At 2020-07-18 03:48:57 +0000 UTC - event for exec-volume-test-dynamicpv-xwb5: {default-scheduler } FailedScheduling: 0/6 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.5/1285226062380273664 STEP: Found 4 events. Jul 20 16:10:38.369: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for exec-volume-test-dynamicpv-7fcc: {default-scheduler } FailedScheduling: 0/6 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Jul 20 16:10:38.369: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for exec-volume-test-dynamicpv-7fcc: {default-scheduler } FailedScheduling: 0/6 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Jul 20 16:10:38.369: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for exec-volume-test-dynamicpv-7fcc: {default-scheduler } FailedScheduling: skip schedule deleting pod: e2e-volume-9629/exec-volume-test-dynamicpv-7fcc Jul 20 16:10:38.369: INFO: At 2020-07-20 16:05:24 +0000 UTC - event for azure-diskdm6s9: {persistentvolume-controller } ProvisioningSucceeded: Successfully provisioned volume pvc-21d39c8e-0eeb-4251-9789-dfe97a0ce40e using kubernetes.io/azure-disk
Both of the links provided are for 4.5; however, this change has not been backported to 4.5 yet - it only exists in 4.6 at this time. If we don't see any failures in 4.6, then we can proceed to backport it. Is it possible to chec kand see if these failures exist in 4.6, which contains the change?
Hi Huffman, it's my fault. I did not see failures exist in 4.6. Status changed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196