From openshift_cluster-openshift-controller-manager-operator. Seen 16 occurrences over the last 24 hours. The pods which fail are not consistent. Example: ns/openshift-monitoring pod/prometheus-adapter-5f7c5567b-7nhx4 Failed create pod sandbox: rpc error: code = Unknown desc = failed to get network status for pod sandbox k8s_prometheus-adapter-5f7c5567b-7nhx4_openshift-monitoring_2eef8612-614f-11e9-a571-12652b58265c_1(f82a62b13bce9a2fa46fc91893f595a4553853fb71d29f8af62f923b72dcb8fc): Unexpected command output nsenter: cannot open /proc/18561/ns/net: No such file or directory\n with error: exit status 1 Related: - https://bugzilla.redhat.com/show_bug.cgi?id=1434950#c15 - https://github.com/kubernetes/kubernetes/pull/72105
Giuseppe points out that when this happens we leak the network namespace that CRI-O failed to clean up because of kubelet's early container-process reaping.
Why the Containers assignment? Isn't tge issue the early kubelet process reaping without giving CRI-O time to tear down? See kubernetes#72105, linked from the description.
This can also present as [1] (so I can find this issue from that direction too ;): Warning Failed 11m (x447 over 128m) kubelet, ip-10-0-139-192.ec2.internal Error: container create failed: container_linux.go:329: creating new parent process caused "container_linux.go:1762: running lstat on namespace path \"/proc/3905/ns/ipc\" caused \"lstat /proc/3905/ns/ipc: no such file or directory\"" [1]: https://github.com/cri-o/cri-o/issues/1927#issuecomment-474678516
I can't find "running lstat on namespace path" or "Unexpected command output nsenter" in search.svc.ci.openshift.org for the past 14d. Maybe this fixed itself?
Is this still happening on a 4.2 cluster? crio was updated to 1.14 about 2 weeks ago (beginning of July).
This should be fixed in current versions of cri-o 1.13 (OCP 4.1) and 1.14 (OCP 4.2) https://github.com/cri-o/cri-o/pull/2143