Bug 1879205 - Multus: Unable to get Process Stats: couldn't open cpu cgroup procs file
Summary: Multus: Unable to get Process Stats: couldn't open cpu cgroup procs file
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Francesco Giudici
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks: 1875950
TreeView+ depends on / blocked
 
Reported: 2020-09-15 16:30 UTC by Douglas Smith
Modified: 2024-06-13 23:05 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-12 15:04:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs showing a container stuck in "Unable to get Process Stats" (13.24 MB, application/gzip)
2020-10-23 23:03 UTC, W. Trevor King
no flags Details

Description Douglas Smith 2020-09-15 16:30:17 UTC
Pods that are running the Multus components (e.g. pods in the openshift-multus namespace), especially the "multus" daemonset, are causing an error shown from hyperkube that reads: "Unable to get Process Stats: couldn't open cpu cgroup procs file"

This doesn't appear to be caused by the multus-admission-controller.

Example error follows:

-----------------
[root@ci-ln-csl69i2-f76d1-9zlx8-worker-b-5vf4g /]# journalctl | grep -i "cgroup"
Sep 11 21:33:01 ci-ln-csl69i2-f76d1-9zlx8-worker-b-5vf4g hyperkube[1528]: I0911 21:33:01.227435    1528 handler.go:181] Unable to get Process Stats: couldn't open cpu cgroup procs file /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6cffd1ff_d5ec_4db1_afd9_1af7b7662564.slice/crio-30f8e8f3a37a272774eacc7d01da848dd50eaefc5dabb423bc84a902930628a1.scope/cgroup.procs : open /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6cffd1ff_d5ec_4db1_afd9_1af7b7662564.slice/crio-30f8e8f3a37a272774eacc7d01da848dd50eaefc5dabb423bc84a902930628a1.scope/cgroup.procs: no such file or directory
-----------------

To associate these errors with a pod, first query the cluster for the crio ids such as:

oc get pod -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.initContainerStatuses[*].containerID}{.status.containerStatuses[*].containerID}{"\n"}{end}' -A | grep -i multus

Then, grep the journal for this error, and the IDs from the above command such as:

journalctl | grep cgroup.proc | grep -P "($id|$another_id|$etc)"


This was discovered by Cameron Meadors while investigating BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1875950

Comment 3 W. Trevor King 2020-10-23 23:03:29 UTC
Created attachment 1723858 [details]
Logs showing a container stuck in "Unable to get Process Stats"

Logs from bug 1891143, where this also turned up.

Comment 4 W. Trevor King 2020-10-23 23:04:36 UTC
^ those log are from 4.6.0-rc.4, with some comments on them in [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1891143#c4

Comment 5 W. Trevor King 2020-10-23 23:05:35 UTC
Moving to the node team, since this seems like a kubelet/CRI-O fumble, and not something particular to the managed pods.

Comment 6 Francesco Giudici 2020-12-04 16:59:25 UTC
Not looked into this yet. Moving to the next sprint.

Comment 22 Francesco Giudici 2021-11-12 15:04:49 UTC
The error reported in this bug is thrown from cAdvisor, trying to fetch a cgroups path that is no more there.
How cAdvisor gets out of sync is not trivial to understand.
Apart from the error message, there is no effect on the cluster: the cgroups path is not fetched, as it is not there.
This error hints that some error happened on the cluster, but it is not the root cause nor directly related.
As a final note, we don't have reports of this error on recent OCP releases (4.7 or above). I think this will not happen anymore.
Closing for now. Please reopen if needed.


Note You need to log in before you can comment on or make changes to this bug.