Created attachment 1760434 [details] Node logs Description of problem: On OCP 4.7.0, during a Serverless e2e test run, we sometimes get a pod stuck in Pending state with the containers in ContainerCreating , blocking a namespace to be deleted (stuck in Terminating) Originally thought to be caused by https://bugzilla.redhat.com/show_bug.cgi?id=1915085 , which is however closed now, as supposed to be fixed in 4.7.0 . $ cat ip-10-0-167-95.us-west-1.compute.internal.logs | grep service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5 Mar 03 15:04:14.328073 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.328043 1437 kubelet.go:1920] SyncLoop (ADD, "api"): "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" Mar 03 15:04:14.463572 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.463544 1437 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-b2x2s" (UniqueName: "kubernetes.io/secret/b03b8332-dcf9-4d58-8350-7423a43ce373-default-token-b2x2s") pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5" (UID: "b03b8332-dcf9-4d58-8350-7423a43ce373") Mar 03 15:04:14.563994 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.563951 1437 reconciler.go:269] operationExecutor.MountVolume started for volume "default-token-b2x2s" (UniqueName: "kubernetes.io/secret/b03b8332-dcf9-4d58-8350-7423a43ce373-default-token-b2x2s") pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5" (UID: "b03b8332-dcf9-4d58-8350-7423a43ce373") Mar 03 15:04:14.567056 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.567014 1437 operation_generator.go:672] MountVolume.SetUp succeeded for volume "default-token-b2x2s" (UniqueName: "kubernetes.io/secret/b03b8332-dcf9-4d58-8350-7423a43ce373-default-token-b2x2s") pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5" (UID: "b03b8332-dcf9-4d58-8350-7423a43ce373") Mar 03 15:04:14.641476 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.641293 1437 kuberuntime_manager.go:439] No sandbox for pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" can be found. Need to start a new one Mar 03 15:04:14.642095 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:14.641917219Z" level=info msg="Running pod sandbox: serving-tests/service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5/POD" id=1020378a-3031-432b-bea4-30d0eaa3f989 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Mar 03 15:04:14.667713 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:14.667664756Z" level=info msg="Got pod network &{Name:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5 Namespace:serving-tests ID:4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79 NetNS:/var/run/netns/d8873654-0018-40ef-a4ca-096063fd0f2c Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]}" Mar 03 15:04:14.995839 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.995754 1437 kubelet.go:1936] SyncLoop (DELETE, "api"): "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" Mar 03 15:04:16.383778 ip-10-0-167-95 crio[1401]: 2021-03-03T15:04:16Z [verbose] Add: serving-tests:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5:b03b8332-dcf9-4d58-8350-7423a43ce373:(openshift-sdn):eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/var/run/netns/d8873654-0018-40ef-a4ca-096063fd0f2c"}],"ips":[{"version":"4","interface":0,"address":"10.130.2.43/23"}],"routes":[{"dst":"0.0.0.0/0","gw":"10.130.2.1"},{"dst":"224.0.0.0/4"},{"dst":"10.128.0.0/14"}],"dns":{}} Mar 03 15:04:16.383778 ip-10-0-167-95 crio[1401]: I0303 15:04:16.369578 82074 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"serving-tests", Name:"service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5", UID:"b03b8332-dcf9-4d58-8350-7423a43ce373", APIVersion:"v1", ResourceVersion:"93756", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.130.2.43/23] Mar 03 15:04:16.384320 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:16.384273419Z" level=info msg="Got pod network &{Name:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5 Namespace:serving-tests ID:4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79 NetNS:/var/run/netns/d8873654-0018-40ef-a4ca-096063fd0f2c Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]}" Mar 03 15:04:16.387883 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:16.387846 1437 kubelet.go:1936] SyncLoop (DELETE, "api"): "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" Mar 03 15:04:16.482671 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:16.482647546Z" level=info msg="Got pod network &{Name:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5 Namespace:serving-tests ID:4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79 NetNS:/var/run/netns/d8873654-0018-40ef-a4ca-096063fd0f2c Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]}" Mar 03 15:04:16.606903 ip-10-0-167-95 crio[1401]: 2021-03-03T15:04:16Z [verbose] Del: serving-tests:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5:b03b8332-dcf9-4d58-8350-7423a43ce373:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"} Mar 03 15:04:16.654743 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:16.654712689Z" level=info msg="RunSandbox: releasing container name: k8s_POD_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests_b03b8332-dcf9-4d58-8350-7423a43ce373_0" id=1020378a-3031-432b-bea4-30d0eaa3f989 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Mar 03 15:04:16.654743 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:16.654744610Z" level=info msg="RunSandbox: releasing pod sandbox name: k8s_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests_b03b8332-dcf9-4d58-8350-7423a43ce373_0" id=1020378a-3031-432b-bea4-30d0eaa3f989 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox Mar 03 15:04:16.654982 ip-10-0-167-95 hyperkube[1437]: E0303 15:04:16.654951 1437 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" failed: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF Mar 03 15:04:16.654982 ip-10-0-167-95 hyperkube[1437]: E0303 15:04:16.654964 1437 kuberuntime_manager.go:755] createPodSandbox for pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" failed: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF Mar 03 15:04:16.655023 ip-10-0-167-95 hyperkube[1437]: E0303 15:04:16.655004 1437 pod_workers.go:191] Error syncing pod b03b8332-dcf9-4d58-8350-7423a43ce373 ("service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)"), skipping: failed to "CreatePodSandbox" for "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" with CreatePodSandboxError: "CreatePodSandbox for pod \"service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)\" failed: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF" Mar 03 15:05:16.422773 ip-10-0-167-95 hyperkube[1437]: E0303 15:05:16.422736 1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory Mar 03 15:07:16.422932 ip-10-0-167-95 hyperkube[1437]: E0303 15:07:16.422901 1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory Mar 03 15:11:16.423060 ip-10-0-167-95 hyperkube[1437]: E0303 15:11:16.423028 1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory Mar 03 15:19:16.423190 ip-10-0-167-95 hyperkube[1437]: E0303 15:19:16.423155 1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory Mar 03 15:35:16.423355 ip-10-0-167-95 hyperkube[1437]: E0303 15:35:16.423320 1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory Mar 03 15:55:16.423527 ip-10-0-167-95 hyperkube[1437]: E0303 15:55:16.423490 1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory Mar 03 16:15:16.423706 ip-10-0-167-95 hyperkube[1437]: E0303 16:15:16.423671 1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory Version-Release number of selected component (if applicable): 4.7.0 How reproducible: Sometimes Steps to Reproduce: No specific reproducer yet, this occurs sometimes on our serverless knative serving e2e run. Actual results: Pod is sometimes stuck in Pending Expected results: Deleted pod should be removed eventually after deletionGracePeriodSeconds when deleted. Additional info:
Created attachment 1760435 [details] pod yaml