1701326 – Unexpected command output nsenter: cannot open /proc/34316/ns/net

Bug 1701326 - Unexpected command output nsenter: cannot open /proc/34316/ns/net

Summary: Unexpected command output nsenter: cannot open /proc/34316/ns/net

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Ryan Phillips
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-04-18 16:01 UTC by Steve Milner
Modified:	2019-10-29 16:41 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-07-29 22:15:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Steve Milner 2019-04-18 16:01:38 UTC

From openshift_cluster-openshift-controller-manager-operator.

Seen 16 occurrences over the last 24 hours. The pods which fail are not consistent.

Example:

ns/openshift-monitoring pod/prometheus-adapter-5f7c5567b-7nhx4 Failed create pod sandbox: rpc error: code = Unknown desc = failed to get network status for pod sandbox k8s_prometheus-adapter-5f7c5567b-7nhx4_openshift-monitoring_2eef8612-614f-11e9-a571-12652b58265c_1(f82a62b13bce9a2fa46fc91893f595a4553853fb71d29f8af62f923b72dcb8fc): Unexpected command output nsenter: cannot open /proc/18561/ns/net: No such file or directory\n with error: exit status 1

Related:
- https://bugzilla.redhat.com/show_bug.cgi?id=1434950#c15
- https://github.com/kubernetes/kubernetes/pull/72105

Comment 2 W. Trevor King 2019-04-18 16:06:54 UTC

Giuseppe points out that when this happens we leak the network namespace that CRI-O failed to clean up because of kubelet's early container-process reaping.

Comment 4 W. Trevor King 2019-04-23 09:03:44 UTC

Why the Containers assignment?  Isn't tge issue the early kubelet process reaping without giving CRI-O time to tear down?  See kubernetes#72105, linked from the description.

Comment 6 W. Trevor King 2019-05-10 03:09:36 UTC

This can also present as [1] (so I can find this issue from that direction too ;):

  Warning  Failed            11m (x447 over 128m)   kubelet, ip-10-0-139-192.ec2.internal  Error: container create failed: container_linux.go:329: creating new parent process caused "container_linux.go:1762: running lstat on namespace path \"/proc/3905/ns/ipc\" caused \"lstat /proc/3905/ns/ipc: no such file or directory\""

[1]: https://github.com/cri-o/cri-o/issues/1927#issuecomment-474678516

Comment 7 Seth Jennings 2019-06-18 14:06:17 UTC

I can't find "running lstat on namespace path" or "Unexpected command output nsenter" in search.svc.ci.openshift.org for the past 14d.  Maybe this fixed itself?

Comment 8 Ryan Phillips 2019-07-23 21:07:52 UTC

Is this still happening on a 4.2 cluster? crio was updated to 1.14 about 2 weeks ago (beginning of July).

Comment 9 Seth Jennings 2019-07-29 22:15:19 UTC

This should be fixed in current versions of cri-o 1.13 (OCP 4.1) and 1.14 (OCP 4.2)
https://github.com/cri-o/cri-o/pull/2143

Note You need to log in before you can comment on or make changes to this bug.