Bug 1934691 - Pod stuck in Pending/ContainerCreating blocking namespace termination
Summary: Pod stuck in Pending/ContainerCreating blocking namespace termination
Keywords:
Status: CLOSED DUPLICATE of bug 1915085
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.8.0
Assignee: Harshal Patil
QA Contact: Weinan Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-03 17:02 UTC by Marek Schmidt
Modified: 2021-03-11 09:09 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-11 09:09:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Node logs (7.59 MB, text/plain)
2021-03-03 17:02 UTC, Marek Schmidt
no flags Details
pod yaml (16.71 KB, text/plain)
2021-03-03 17:03 UTC, Marek Schmidt
no flags Details

Description Marek Schmidt 2021-03-03 17:02:46 UTC
Created attachment 1760434 [details]
Node logs

Description of problem:

On OCP 4.7.0, during a Serverless e2e test run, we sometimes get a pod stuck in Pending state with the containers in ContainerCreating , blocking a namespace to be deleted (stuck in Terminating)

Originally thought to be caused by https://bugzilla.redhat.com/show_bug.cgi?id=1915085 , which is however closed now, as supposed to be fixed in 4.7.0 .


$ cat ip-10-0-167-95.us-west-1.compute.internal.logs | grep service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5
Mar 03 15:04:14.328073 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.328043    1437 kubelet.go:1920] SyncLoop (ADD, "api"): "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)"
Mar 03 15:04:14.463572 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.463544    1437 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-b2x2s" (UniqueName: "kubernetes.io/secret/b03b8332-dcf9-4d58-8350-7423a43ce373-default-token-b2x2s") pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5" (UID: "b03b8332-dcf9-4d58-8350-7423a43ce373")
Mar 03 15:04:14.563994 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.563951    1437 reconciler.go:269] operationExecutor.MountVolume started for volume "default-token-b2x2s" (UniqueName: "kubernetes.io/secret/b03b8332-dcf9-4d58-8350-7423a43ce373-default-token-b2x2s") pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5" (UID: "b03b8332-dcf9-4d58-8350-7423a43ce373")
Mar 03 15:04:14.567056 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.567014    1437 operation_generator.go:672] MountVolume.SetUp succeeded for volume "default-token-b2x2s" (UniqueName: "kubernetes.io/secret/b03b8332-dcf9-4d58-8350-7423a43ce373-default-token-b2x2s") pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5" (UID: "b03b8332-dcf9-4d58-8350-7423a43ce373")
Mar 03 15:04:14.641476 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.641293    1437 kuberuntime_manager.go:439] No sandbox for pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" can be found. Need to start a new one
Mar 03 15:04:14.642095 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:14.641917219Z" level=info msg="Running pod sandbox: serving-tests/service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5/POD" id=1020378a-3031-432b-bea4-30d0eaa3f989 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Mar 03 15:04:14.667713 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:14.667664756Z" level=info msg="Got pod network &{Name:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5 Namespace:serving-tests ID:4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79 NetNS:/var/run/netns/d8873654-0018-40ef-a4ca-096063fd0f2c Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]}"
Mar 03 15:04:14.995839 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:14.995754    1437 kubelet.go:1936] SyncLoop (DELETE, "api"): "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)"
Mar 03 15:04:16.383778 ip-10-0-167-95 crio[1401]: 2021-03-03T15:04:16Z [verbose] Add: serving-tests:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5:b03b8332-dcf9-4d58-8350-7423a43ce373:(openshift-sdn):eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/var/run/netns/d8873654-0018-40ef-a4ca-096063fd0f2c"}],"ips":[{"version":"4","interface":0,"address":"10.130.2.43/23"}],"routes":[{"dst":"0.0.0.0/0","gw":"10.130.2.1"},{"dst":"224.0.0.0/4"},{"dst":"10.128.0.0/14"}],"dns":{}}
Mar 03 15:04:16.383778 ip-10-0-167-95 crio[1401]: I0303 15:04:16.369578   82074 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"serving-tests", Name:"service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5", UID:"b03b8332-dcf9-4d58-8350-7423a43ce373", APIVersion:"v1", ResourceVersion:"93756", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [10.130.2.43/23]
Mar 03 15:04:16.384320 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:16.384273419Z" level=info msg="Got pod network &{Name:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5 Namespace:serving-tests ID:4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79 NetNS:/var/run/netns/d8873654-0018-40ef-a4ca-096063fd0f2c Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]}"
Mar 03 15:04:16.387883 ip-10-0-167-95 hyperkube[1437]: I0303 15:04:16.387846    1437 kubelet.go:1936] SyncLoop (DELETE, "api"): "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)"
Mar 03 15:04:16.482671 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:16.482647546Z" level=info msg="Got pod network &{Name:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5 Namespace:serving-tests ID:4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79 NetNS:/var/run/netns/d8873654-0018-40ef-a4ca-096063fd0f2c Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}]}"
Mar 03 15:04:16.606903 ip-10-0-167-95 crio[1401]: 2021-03-03T15:04:16Z [verbose] Del: serving-tests:service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5:b03b8332-dcf9-4d58-8350-7423a43ce373:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"}
Mar 03 15:04:16.654743 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:16.654712689Z" level=info msg="RunSandbox: releasing container name: k8s_POD_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests_b03b8332-dcf9-4d58-8350-7423a43ce373_0" id=1020378a-3031-432b-bea4-30d0eaa3f989 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Mar 03 15:04:16.654743 ip-10-0-167-95 crio[1401]: time="2021-03-03 15:04:16.654744610Z" level=info msg="RunSandbox: releasing pod sandbox name: k8s_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests_b03b8332-dcf9-4d58-8350-7423a43ce373_0" id=1020378a-3031-432b-bea4-30d0eaa3f989 name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Mar 03 15:04:16.654982 ip-10-0-167-95 hyperkube[1437]: E0303 15:04:16.654951    1437 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" failed: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF
Mar 03 15:04:16.654982 ip-10-0-167-95 hyperkube[1437]: E0303 15:04:16.654964    1437 kuberuntime_manager.go:755] createPodSandbox for pod "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" failed: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF
Mar 03 15:04:16.655023 ip-10-0-167-95 hyperkube[1437]: E0303 15:04:16.655004    1437 pod_workers.go:191] Error syncing pod b03b8332-dcf9-4d58-8350-7423a43ce373 ("service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)"), skipping: failed to "CreatePodSandbox" for "service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)" with CreatePodSandboxError: "CreatePodSandbox for pod \"service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_serving-tests(b03b8332-dcf9-4d58-8350-7423a43ce373)\" failed: rpc error: code = Unknown desc = error reading container (probably exited) json message: EOF"
Mar 03 15:05:16.422773 ip-10-0-167-95 hyperkube[1437]: E0303 15:05:16.422736    1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory
Mar 03 15:07:16.422932 ip-10-0-167-95 hyperkube[1437]: E0303 15:07:16.422901    1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory
Mar 03 15:11:16.423060 ip-10-0-167-95 hyperkube[1437]: E0303 15:11:16.423028    1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory
Mar 03 15:19:16.423190 ip-10-0-167-95 hyperkube[1437]: E0303 15:19:16.423155    1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory
Mar 03 15:35:16.423355 ip-10-0-167-95 hyperkube[1437]: E0303 15:35:16.423320    1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory
Mar 03 15:55:16.423527 ip-10-0-167-95 hyperkube[1437]: E0303 15:55:16.423490    1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory
Mar 03 16:15:16.423706 ip-10-0-167-95 hyperkube[1437]: E0303 16:15:16.423671    1437 fsHandler.go:114] failed to collect filesystem stats - rootDiskErr: could not stat "/var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff" to get inode usage: stat /var/lib/containers/storage/overlay/5fa51fa92776998559d75e4a268a15980aaf3578fd9bb9a7fb74ec056b76e809/diff: no such file or directory, extraDiskErr: could not stat "/var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log" to get inode usage: stat /var/log/pods/serving-tests_service-validation-w-rrlteefc-00001-deployment-78847ff55b-gvll5_b03b8332-dcf9-4d58-8350-7423a43ce373/4ee9601d9463856c611ad7f9053590d7299b24f7d328abc4281e540f4dc66e79.log: no such file or directory


Version-Release number of selected component (if applicable):
4.7.0

How reproducible:
Sometimes

Steps to Reproduce:
No specific reproducer yet, this occurs sometimes on our serverless knative serving e2e run.

Actual results:
Pod is sometimes stuck in Pending

Expected results:
Deleted pod should be removed eventually after deletionGracePeriodSeconds when deleted.

Additional info:

Comment 1 Marek Schmidt 2021-03-03 17:03:12 UTC
Created attachment 1760435 [details]
pod yaml


Note You need to log in before you can comment on or make changes to this bug.