Subpath test pod did not start within 5 minutes, a number of errors reported in pod events, none of which indicate why the pod never came out of pending: 4.4 release job failed https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.4/1936 We need to understand why this happened (could be a crio issue, a volume issue, or an apiserver issue). I consider this a release blocker unless we can triage this down to why we failed - generally lower level bugs surface as "couldn't do this thing within a normal amount of time". The test itself is not flaky that I can see. Mar 7 14:10:06.569: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for gluster-server: {default-scheduler } Scheduled: Successfully assigned e2e-provisioning-1408/gluster-server to ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal Mar 7 14:10:06.569: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod-subpath-test-gluster-preprovisionedpv-6g7j: {default-scheduler } Scheduled: Successfully assigned e2e-provisioning-1408/pod-subpath-test-gluster-preprovisionedpv-6g7j to ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal Mar 7 14:10:06.569: INFO: At 2020-03-07 14:03:40 +0000 UTC - event for gluster-server: {kubelet ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal} Pulling: Pulling image "gcr.io/kubernetes-e2e-test-images/volume/gluster:1.0" Mar 7 14:10:06.569: INFO: At 2020-03-07 14:04:25 +0000 UTC - event for gluster-server: {kubelet ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal} Pulled: Successfully pulled image "gcr.io/kubernetes-e2e-test-images/volume/gluster:1.0" Mar 7 14:10:06.569: INFO: At 2020-03-07 14:04:25 +0000 UTC - event for gluster-server: {kubelet ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal} Created: Created container gluster-server Mar 7 14:10:06.569: INFO: At 2020-03-07 14:04:26 +0000 UTC - event for gluster-server: {kubelet ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal} Started: Started container gluster-server Mar 7 14:10:06.569: INFO: At 2020-03-07 14:04:27 +0000 UTC - event for pvc-pmzmz: {persistentvolume-controller } ProvisioningFailed: storageclass.storage.k8s.io "e2e-provisioning-1408" not found Mar 7 14:10:06.569: INFO: At 2020-03-07 14:04:48 +0000 UTC - event for pod-subpath-test-gluster-preprovisionedpv-6g7j: {kubelet ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal} FailedMount: Unable to attach or mount volumes: unmounted volumes=[test-volume liveness-probe-volume default-token-6lsm6], unattached volumes=[test-volume liveness-probe-volume default-token-6lsm6]: error processing PVC e2e-provisioning-1408/pvc-pmzmz: failed to fetch PVC from API server: persistentvolumeclaims "pvc-pmzmz" is forbidden: User "system:node:ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal" cannot get resource "persistentvolumeclaims" in API group "" in the namespace "e2e-provisioning-1408": no relationship found between node "ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal" and this object Mar 7 14:10:06.569: INFO: At 2020-03-07 14:04:49 +0000 UTC - event for pod-subpath-test-gluster-preprovisionedpv-6g7j: {kubelet ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal} FailedMount: MountVolume.NewMounter initialization failed for volume "gluster-vr8wh" : endpoints "gluster-server" not found Mar 7 14:10:06.569: INFO: At 2020-03-07 14:07:02 +0000 UTC - event for pod-subpath-test-gluster-preprovisionedpv-6g7j: {kubelet ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal} FailedMount: Unable to attach or mount volumes: unmounted volumes=[test-volume], unattached volumes=[liveness-probe-volume default-token-6lsm6 test-volume]: timed out waiting for the condition Mar 7 14:10:06.569: INFO: At 2020-03-07 14:09:19 +0000 UTC - event for pod-subpath-test-gluster-preprovisionedpv-6g7j: {kubelet ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal} FailedMount: Unable to attach or mount volumes: unmounted volumes=[test-volume], unattached volumes=[test-volume liveness-probe-volume default-token-6lsm6]: timed out waiting for the condition Mar 7 14:10:06.569: INFO: At 2020-03-07 14:09:56 +0000 UTC - event for gluster-server: {kubelet ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal} Killing: Stopping container gluster-server Mar 7 14:10:06.574: INFO: POD NODE PHASE GRACE CONDITIONS Mar 7 14:10:06.574: INFO: Mar 7 14:10:06.598: INFO: skipping dumping cluster info - cluster too large Mar 7 14:10:06.598: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-provisioning-1408" for this suite. Mar 7 14:10:06.637: INFO: Running AfterSuite actions on all nodes Mar 7 14:10:06.637: INFO: Running AfterSuite actions on node 1 fail [k8s.io/kubernetes/test/e2e/framework/util.go:894]: Unexpected error: <*errors.errorString | 0xc0050e6860>: { s: "expected pod \"pod-subpath-test-gluster-preprovisionedpv-6g7j\" success: Gave up after waiting 5m0s for pod \"pod-subpath-test-gluster-preprovisionedpv-6g7j\" to be \"success or failure\"", } expected pod "pod-subpath-test-gluster-preprovisionedpv-6g7j" success: Gave up after waiting 5m0s for pod "pod-subpath-test-gluster-preprovisionedpv-6g7j" to be "success or failure" occurred
The logs were not attached to the CI build for some reason or the artifact server is down... I'll check again tomorrow.
Looks like a test issue: Mar 07 14:04:48.879871 ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal hyperkube[1341]: I0307 14:04:48.876956 1341 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"e2e-provisioning-1408", Name:"pod-subpath-test-gluster-preprovisionedpv-6g7j", UID:"e470c7a6-e860-4c6a-949b-5f028fc7c620", APIVersion:"v1", ResourceVersion:"34487", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Unable to attach or mount volumes: unmounted volumes=[test-volume liveness-probe-volume default-token-6lsm6], unattached volumes=[test-volume liveness-probe-volume default-token-6lsm6]: error processing PVC e2e-provisioning-1408/pvc-pmzmz: failed to fetch PVC from API server: persistentvolumeclaims "pvc-pmzmz" is forbidden: User "system:node:ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal" cannot get resource "persistentvolumeclaims" in API group "" in the namespace "e2e-provisioning-1408": no relationship found between node "ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal" and this object "pvc-pmzmz" is forbidden: User "system:node:ci-op-8fxjt-w-d-npxj8.c.openshift-gce-devel-ci.internal"
I think this still might be a problem. For azure environments there are other bugs that prevent the pod from coming up but for GCP or AWS, this will indicates a bug in system somewhere (or e2e). I will investigate this in more detail in upstream. A more recent run on GCP - https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/1273584118654832640
I think the root cause of the bugs associated with gluster tests is, we are creating endpoints without service entries. This can cause endpoint to become unavailable. Opened a PR to fix this - https://github.com/kubernetes/kubernetes/pull/93535
Since upstream PR is merged - https://github.com/kubernetes/kubernetes/pull/93535 , I expect that when OCP-4.6 is rebased against released version of k8s, we will automatically get the fix.