Bug 1623261

Summary: [3.11] Logs contain error 'Failed to get system container stats for "/system.slice/atomic-openshift-node.service"'
Product: OpenShift Container Platform Reporter: Luke Stanton <lstanton>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED ERRATA QA Contact: Weinan Liu <weinliu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: adeshpan, alexander.kozhemyakin, aos-bugs, dapark, dmoessne, hgomes, jokerman, jrosenta, judd, klaas, leonardoscampos, mmccomas, mpark, palonsor, rkrawitz, sjenning, tmoreira, wsun
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1643142 (view as bug list) Environment:
Last Closed: 2019-04-11 05:38:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1643142    

Description Luke Stanton 2018-08-28 20:42:32 UTC
Description of problem:

Node logs contain the following error:
--------------
atomic-openshift-node[2601]: E0828 09:13:38.702385    2601 summary.go:102] Failed to get system container stats for "/system.slice/atomic-openshift-node.service": failed to get cgroup stats for "/system.slice/atomic-openshift-node.service": failed to get container info for "/system.slice/atomic-openshift-node.service": unknown container "/system.slice/atomic-openshift-node.service"
--------------

How reproducible:
Seems consistent

Steps to Reproduce:
1. Deploy an OpenShift 3.10 cluster

Actual results:
Error is continually logged. There doesn't seem to be any severe side effects but the error adds noise to the log output.

Expected results:
Error wouldn't occur under normal circumstances.

Comment 1 Mrunal Patel 2018-08-29 17:38:35 UTC
Seth, could you take a look? This is probably coming from cadvisor.

Comment 2 Judd Maltin 2018-08-30 17:36:52 UTC
Same here:

# rpm -qa |grep openshift
atomic-openshift-clients-3.10.14-1.git.0.ba8ae6d.el7.x86_64
atomic-openshift-docker-excluder-3.10.14-1.git.0.ba8ae6d.el7.noarch
atomic-openshift-3.10.14-1.git.0.ba8ae6d.el7.x86_64
atomic-openshift-excluder-3.10.14-1.git.0.ba8ae6d.el7.noarch
atomic-openshift-hyperkube-3.10.14-1.git.0.ba8ae6d.el7.x86_64
atomic-openshift-node-3.10.14-1.git.0.ba8ae6d.el7.x86_64

Comment 3 Luke Stanton 2018-09-05 16:46:01 UTC
3.10.34 showing a similar error for the docker service:

-----------------
Sep  5 09:31:22 vmlxopencd01 atomic-openshift-node: E0905 09:31:22.086411   18485 summary.go:102] Failed to get system container stats for "/system.slice/atomic-openshift-node.service": failed to get cgroup stats for "/system.slice/atomic-openshift-node.service": failed to get container info for "/system.slice/atomic-openshift-node.service": unknown container "/system.slice/atomic-openshift-node.service"
Sep  5 09:31:22 vmlxopencd01 atomic-openshift-node: E0905 09:31:22.086841   18485 summary.go:102] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
-----------------

Comment 4 Luke Stanton 2018-09-05 16:59:08 UTC
Is there a workaround to silence these error messages? They cause a lot of noise in the logs.

Comment 5 Tiago M. Vieira 2018-09-21 17:30:00 UTC
Getting the same issue with my deployment of 3.10.34:

Sep 21 13:28:00 tmor-master.x.x.x atomic-openshift-node[83809]: E0921 13:28:00.786348   83809 summary.go:102] Failed to get system container stats for "/system.slice/atomic-openshift-node.service": failed to get cgroup stats for "/system.slice/atomic-openshift-node.service": failed to get container info for "/system.slice/atomic-openshift-node.service": unknown container "/system.slice/atomic-openshift-node.service"
Sep 21 13:28:10 tmor-master.x.x.x atomic-openshift-node[83809]: E0921 13:28:10.826562   83809 summary.go:102] Failed to get system container stats for "/system.slice/atomic-openshift-node.service": failed to get cgroup stats for "/system.slice/atomic-openshift-node.service": failed to get container info for "/system.slice/atomic-openshift-node.service": unknown container "/system.slice/atomic-openshift-node.service"

Comment 6 Alexander Kozhemyakin 2018-09-25 07:58:50 UTC
I have the same issue. Is there are any workaround or fix?

Comment 13 Seth Jennings 2018-10-01 18:59:36 UTC
Origin PR:
https://github.com/openshift/origin/pull/21138

Comment 14 Klaas Demter 2018-10-23 07:00:49 UTC
This issue is also in 3.11, any plans to backport the fix to it?

Comment 15 Seth Jennings 2018-10-25 20:36:21 UTC
Cloned https://bugzilla.redhat.com/show_bug.cgi?id=1643142 to track 3.11 backport

Comment 20 Seth Jennings 2019-03-11 17:59:31 UTC
*** Bug 1643142 has been marked as a duplicate of this bug. ***

Comment 23 Weinan Liu 2019-03-21 07:18:01 UTC
Verified to be fixed 

[root@ip-172-18-6-238 ~]# cat /etc/systemd/system.conf.d/origin-accounting.conf 
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
[root@ip-172-18-6-238 ~]# oc version
oc v3.11.98
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-6-238.ec2.internal:8443
openshift v3.11.69
kubernetes v1.11.0+d4cacc0
[root@ip-172-18-6-238 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.6 (Maipo)

Comment 25 errata-xmlrpc 2019-04-11 05:38:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0636

Comment 27 leonardoscampos@brq.com 2019-11-18 19:26:02 UTC
Folks i just find a solution in my environment to this problem.

I just edit the /etc/systemd/system.conf and uncomment the line DefaultBlockIOAccounting and set to yes.

after reboot my system the problem was solved.

environment
user@local:# oc version
oc v3.11.153
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO


user@local:~# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.7 (Maipo)


Hope this information help you guys.