2013321 – TuneD: high CPU utilization of the TuneD daemon.

Bug 2013321 - TuneD: high CPU utilization of the TuneD daemon.

Summary: TuneD: high CPU utilization of the TuneD daemon.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node Tuning Operator
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Jiří Mencák
QA Contact:	Simon
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2013653
TreeView+	depends on / blocked

Reported:	2021-10-12 15:12 UTC by Jiří Mencák
Modified:	2022-03-10 16:19 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:18:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-node-tuning-operator pull 278	0	None	open	Bug 2013321: TuneD: workaround for high CPU utilization of [scheduler] plug-in.	2021-10-12 15:13:29 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:19:09 UTC

Description Jiří Mencák 2021-10-12 15:12:26 UTC

Description of problem:
The fix for rhbz#1979352 introduced the [scheduler] plug-in as a standard part of openshift TuneD profiles. Unfortunately, the [scheduler] plug-in can be very CPU intensive, especially on the OpenShift platform. The bug for this issue is tracked by rhbz#1921738.  The CPU utilization of the TuneD process can be around 1% of one core.

Version-Release number of selected component (if applicable):
4.8->4.10

How reproducible:
Always.

Steps to Reproduce:
1. Install OCP
2. Watch tuned process utilization either via top -p <pid> or just by querying /proc/<pid>/status

Actual results:
~1% of CPU

Expected results:
~0% of CPU

Additional info:
https://github.com/openshift/cluster-node-tuning-operator/pull/278

Comment 2 Simon 2021-10-13 19:59:59 UTC

$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-rc.8   True        False         3h37m   Cluster version is 4.9.0-rc.8

$ oc get nodes
NAME                                                       STATUS   ROLES    AGE     VERSION
skordas1013-qjh6c-master-0.c.openshift-qe.internal         Ready    master   3h56m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-master-1.c.openshift-qe.internal         Ready    master   3h56m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-master-2.c.openshift-qe.internal         Ready    master   3h56m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-worker-a-h8cgr.c.openshift-qe.internal   Ready    worker   3h44m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-worker-b-flv8z.c.openshift-qe.internal   Ready    worker   3h44m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-worker-c-67tvp.c.openshift-qe.internal   Ready    worker   3h45m   v1.22.0-rc.0+894a78b

# Debug master node
sh-4.4# top -p 20213
  PID   USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
20213   root      20   0  408308  34824  17028 S   1.0   0.2   2:28.88 tuned

# Debug worker node
sh-4.4# top -p 4744
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
4744 root      20   0  407796  33320  16840 S   0.3   0.2   1:16.83 tuned

# ^^ %CPU > 0

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-13-152930   True        False         34m     Cluster version is 4.10.0-0.nightly-2021-10-13-152930

$ oc get nodes
NAME                                                        STATUS   ROLES    AGE   VERSION
skordas1013a-cbhs8-master-0.c.openshift-qe.internal         Ready    master   57m   v1.22.1+9312243
skordas1013a-cbhs8-master-1.c.openshift-qe.internal         Ready    master   57m   v1.22.1+9312243
skordas1013a-cbhs8-master-2.c.openshift-qe.internal         Ready    master   58m   v1.22.1+9312243
skordas1013a-cbhs8-worker-a-m9jkc.c.openshift-qe.internal   Ready    worker   43m   v1.22.1+9312243
skordas1013a-cbhs8-worker-b-gxztz.c.openshift-qe.internal   Ready    worker   43m   v1.22.1+9312243
skordas1013a-cbhs8-worker-c-jb8gz.c.openshift-qe.internal   Ready    worker   43m   v1.22.1+9312243

# Debug master node
sh-4.4# top -p 14146
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
14146 root      20   0 1496068  46124  26120 S   0.0   0.3   0:00.44 openshift-tuned

# Debug worker node
sh-4.4# top -p 2456
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2456 root      20   0 1496068  44880  25616 S   0.0   0.3   0:00.44 openshift-tuned

# %CPU = 0

Comment 5 errata-xmlrpc 2022-03-10 16:18:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.