Bug 1822269
| Summary: | [4.3] pod/project stuck at terminating status: The container could not be located when the pod was terminated (Exit Code: 137) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ryan Phillips <rphillips> | ||||
| Component: | Node | Assignee: | Ryan Phillips <rphillips> | ||||
| Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | urgent | ||||||
| Priority: | urgent | CC: | aaleman, aos-bugs, hongkliu, jiazha, jokerman, juzhao, kgarriso, maszulik, mfojtik, schoudha, skuznets, umohnani, wking, yinzhou | ||||
| Version: | 4.3.z | Keywords: | Reopened | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.3.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1822268 | Environment: | |||||
| Last Closed: | 2020-05-11 21:20:39 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1822268 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Ryan Phillips
2020-04-08 15:58:20 UTC
This 4.3 bug depends on a 4.5 bug?: https://bugzilla.redhat.com/show_bug.cgi?id=1819906 Wouldn't that bug have to be backported to 4.3 and this depends on that then? pod stuck in Terminating status, error is
Apr 29 09:50:53 qe-jia-nfsjd-w-a-l-0 hyperkube[1319]: I0429 09:50:53.924652 1319 kubelet_pods.go:934] Pod "node-exporter-mzlvs_openshift-monitoring(0e69a9e8-89c9-11ea-a50f-42010a000004)" is terminated, but some volumes have not been cleaned up
# oc -n openshift-monitoring get pod -o wide | grep node-exporter-mzlvs | grep Terminating
node-exporter-mzlvs 0/2 Terminating 0 6h59m 10.0.32.5 qe-jia-nfsjd-w-a-l-0 <none> <none>
# oc -n openshift-monitoring describe pod node-exporter-mzlvs
...
Containers:
node-exporter:
Container ID: cri-o://8816d2321a354d07da3ac09b9003f4cdf28b5e890075cd41175aa9abae8c22f8
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9ca176cdb8e9925ac20d2935be5470f75b6ca21a23976b527300b8fdefdbee62
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9ca176cdb8e9925ac20d2935be5470f75b6ca21a23976b527300b8fdefdbee62
Port: <none>
Host Port: <none>
Args:
--web.listen-address=127.0.0.1:9100
--path.procfs=/host/proc
--path.sysfs=/host/sys
--path.rootfs=/host/root
--collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
--no-collector.wifi
--collector.cpu.info
--collector.textfile.directory=/var/node_exporter/textfile
State: Terminated
Reason: Error
..
Exit Code: 143
Started: Tue, 28 Apr 2020 23:25:35 -0400
Finished: Wed, 29 Apr 2020 05:49:40 -0400
Ready: False
more info see the attached file
Created attachment 1682838 [details]
pod in Terminating status
Exit Code: 143
# oc debug node/qe-jia-nfsjd-w-a-l-0
sh-4.2# chroot /host
sh-4.2# crictl ps -a | grep node-exporter
no result
Also related to this bug series are the two post-fix mitigation bugs: bug 1829664 and bug 1829999. Once this gets fixed in 4.3, we will probably pull all edges from 4.2 -> earlier 4.3 to keep folks from getting stuck nodes. Folks who update in the meantime and happen to get stuck nodes will be caught and walked through mitigation via the bug 1829999 backstop. https://bugzilla.redhat.com/show_bug.cgi?id=1820507#c13 has the impact-statement request (on the masterward-tip of this bug series). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2006 Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 |