1623261 – [3.11] Logs contain error 'Failed to get system container stats for "/system.slice/atomic-openshift-node.service"'

Bug 1623261 - [3.11] Logs contain error 'Failed to get system container stats for "/system.slice/atomic-openshift-node.service"'

Summary: [3.11] Logs contain error 'Failed to get system container stats for "/system....

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Seth Jennings
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1643142 (view as bug list)
Depends On:
Blocks:	1643142
TreeView+	depends on / blocked

Reported:	2018-08-28 20:42 UTC by Luke Stanton
Modified:	2023-10-06 17:53 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1643142 (view as bug list)
Environment:
Last Closed:	2019-04-11 05:38:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift origin pull 21138	'None'	closed	contrib: systemd: fix systemd accounting	2021-02-20 01:10:13 UTC
Red Hat Knowledge Base (Solution)	3825591	None	None	None	2019-01-23 09:10:15 UTC
Red Hat Product Errata	RHBA-2019:0636	None	None	None	2019-04-11 05:38:28 UTC

Description Luke Stanton 2018-08-28 20:42:32 UTC

Description of problem:

Node logs contain the following error:
--------------
atomic-openshift-node[2601]: E0828 09:13:38.702385    2601 summary.go:102] Failed to get system container stats for "/system.slice/atomic-openshift-node.service": failed to get cgroup stats for "/system.slice/atomic-openshift-node.service": failed to get container info for "/system.slice/atomic-openshift-node.service": unknown container "/system.slice/atomic-openshift-node.service"
--------------

How reproducible:
Seems consistent

Steps to Reproduce:
1. Deploy an OpenShift 3.10 cluster

Actual results:
Error is continually logged. There doesn't seem to be any severe side effects but the error adds noise to the log output.

Expected results:
Error wouldn't occur under normal circumstances.

Comment 1 Mrunal Patel 2018-08-29 17:38:35 UTC

Seth, could you take a look? This is probably coming from cadvisor.

Comment 2 Judd Maltin 2018-08-30 17:36:52 UTC

Same here:

# rpm -qa |grep openshift
atomic-openshift-clients-3.10.14-1.git.0.ba8ae6d.el7.x86_64
atomic-openshift-docker-excluder-3.10.14-1.git.0.ba8ae6d.el7.noarch
atomic-openshift-3.10.14-1.git.0.ba8ae6d.el7.x86_64
atomic-openshift-excluder-3.10.14-1.git.0.ba8ae6d.el7.noarch
atomic-openshift-hyperkube-3.10.14-1.git.0.ba8ae6d.el7.x86_64
atomic-openshift-node-3.10.14-1.git.0.ba8ae6d.el7.x86_64

Comment 3 Luke Stanton 2018-09-05 16:46:01 UTC

3.10.34 showing a similar error for the docker service:

-----------------
Sep  5 09:31:22 vmlxopencd01 atomic-openshift-node: E0905 09:31:22.086411   18485 summary.go:102] Failed to get system container stats for "/system.slice/atomic-openshift-node.service": failed to get cgroup stats for "/system.slice/atomic-openshift-node.service": failed to get container info for "/system.slice/atomic-openshift-node.service": unknown container "/system.slice/atomic-openshift-node.service"
Sep  5 09:31:22 vmlxopencd01 atomic-openshift-node: E0905 09:31:22.086841   18485 summary.go:102] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
-----------------

Comment 4 Luke Stanton 2018-09-05 16:59:08 UTC

Is there a workaround to silence these error messages? They cause a lot of noise in the logs.

Comment 5 Tiago M. Vieira 2018-09-21 17:30:00 UTC

Getting the same issue with my deployment of 3.10.34:

Sep 21 13:28:00 tmor-master.x.x.x atomic-openshift-node[83809]: E0921 13:28:00.786348   83809 summary.go:102] Failed to get system container stats for "/system.slice/atomic-openshift-node.service": failed to get cgroup stats for "/system.slice/atomic-openshift-node.service": failed to get container info for "/system.slice/atomic-openshift-node.service": unknown container "/system.slice/atomic-openshift-node.service"
Sep 21 13:28:10 tmor-master.x.x.x atomic-openshift-node[83809]: E0921 13:28:10.826562   83809 summary.go:102] Failed to get system container stats for "/system.slice/atomic-openshift-node.service": failed to get cgroup stats for "/system.slice/atomic-openshift-node.service": failed to get container info for "/system.slice/atomic-openshift-node.service": unknown container "/system.slice/atomic-openshift-node.service"

Comment 6 Alexander Kozhemyakin 2018-09-25 07:58:50 UTC

I have the same issue. Is there are any workaround or fix?

Comment 13 Seth Jennings 2018-10-01 18:59:36 UTC

Origin PR:
https://github.com/openshift/origin/pull/21138

Comment 14 Klaas Demter 2018-10-23 07:00:49 UTC

This issue is also in 3.11, any plans to backport the fix to it?

Comment 15 Seth Jennings 2018-10-25 20:36:21 UTC

Cloned https://bugzilla.redhat.com/show_bug.cgi?id=1643142 to track 3.11 backport

Comment 20 Seth Jennings 2019-03-11 17:59:31 UTC

*** Bug 1643142 has been marked as a duplicate of this bug. ***

Comment 23 Weinan Liu 2019-03-21 07:18:01 UTC

Verified to be fixed 

[root@ip-172-18-6-238 ~]# cat /etc/systemd/system.conf.d/origin-accounting.conf 
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
[root@ip-172-18-6-238 ~]# oc version
oc v3.11.98
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-6-238.ec2.internal:8443
openshift v3.11.69
kubernetes v1.11.0+d4cacc0
[root@ip-172-18-6-238 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.6 (Maipo)

Comment 25 errata-xmlrpc 2019-04-11 05:38:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0636

Comment 27 leonardoscampos@brq.com 2019-11-18 19:26:02 UTC

Folks i just find a solution in my environment to this problem.

I just edit the /etc/systemd/system.conf and uncomment the line DefaultBlockIOAccounting and set to yes.

after reboot my system the problem was solved.

environment
user@local:# oc version
oc v3.11.153
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO


user@local:~# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.7 (Maipo)


Hope this information help you guys.

Note You need to log in before you can comment on or make changes to this bug.