Bug 2013321

Summary: TuneD: high CPU utilization of the TuneD daemon.
Product: OpenShift Container Platform Reporter: Jiří Mencák <jmencak>
Component: Node Tuning OperatorAssignee: Jiří Mencák <jmencak>
Status: VERIFIED --- QA Contact: Simon <skordas>
Severity: high Docs Contact:
Priority: high    
Version: 4.10CC: aos-bugs, dagray
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 2013653    

Description Jiří Mencák 2021-10-12 15:12:26 UTC
Description of problem:
The fix for rhbz#1979352 introduced the [scheduler] plug-in as a standard part of openshift TuneD profiles. Unfortunately, the [scheduler] plug-in can be very CPU intensive, especially on the OpenShift platform. The bug for this issue is tracked by rhbz#1921738.  The CPU utilization of the TuneD process can be around 1% of one core.

Version-Release number of selected component (if applicable):
4.8->4.10

How reproducible:
Always.

Steps to Reproduce:
1. Install OCP
2. Watch tuned process utilization either via top -p <pid> or just by querying /proc/<pid>/status

Actual results:
~1% of CPU

Expected results:
~0% of CPU

Additional info:
https://github.com/openshift/cluster-node-tuning-operator/pull/278

Comment 2 Simon 2021-10-13 19:59:59 UTC
$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-rc.8   True        False         3h37m   Cluster version is 4.9.0-rc.8

$ oc get nodes
NAME                                                       STATUS   ROLES    AGE     VERSION
skordas1013-qjh6c-master-0.c.openshift-qe.internal         Ready    master   3h56m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-master-1.c.openshift-qe.internal         Ready    master   3h56m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-master-2.c.openshift-qe.internal         Ready    master   3h56m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-worker-a-h8cgr.c.openshift-qe.internal   Ready    worker   3h44m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-worker-b-flv8z.c.openshift-qe.internal   Ready    worker   3h44m   v1.22.0-rc.0+894a78b
skordas1013-qjh6c-worker-c-67tvp.c.openshift-qe.internal   Ready    worker   3h45m   v1.22.0-rc.0+894a78b

# Debug master node
sh-4.4# top -p 20213
  PID   USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
20213   root      20   0  408308  34824  17028 S   1.0   0.2   2:28.88 tuned

# Debug worker node
sh-4.4# top -p 4744
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
4744 root      20   0  407796  33320  16840 S   0.3   0.2   1:16.83 tuned

# ^^ %CPU > 0

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-13-152930   True        False         34m     Cluster version is 4.10.0-0.nightly-2021-10-13-152930

$ oc get nodes
NAME                                                        STATUS   ROLES    AGE   VERSION
skordas1013a-cbhs8-master-0.c.openshift-qe.internal         Ready    master   57m   v1.22.1+9312243
skordas1013a-cbhs8-master-1.c.openshift-qe.internal         Ready    master   57m   v1.22.1+9312243
skordas1013a-cbhs8-master-2.c.openshift-qe.internal         Ready    master   58m   v1.22.1+9312243
skordas1013a-cbhs8-worker-a-m9jkc.c.openshift-qe.internal   Ready    worker   43m   v1.22.1+9312243
skordas1013a-cbhs8-worker-b-gxztz.c.openshift-qe.internal   Ready    worker   43m   v1.22.1+9312243
skordas1013a-cbhs8-worker-c-jb8gz.c.openshift-qe.internal   Ready    worker   43m   v1.22.1+9312243

# Debug master node
sh-4.4# top -p 14146
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
14146 root      20   0 1496068  46124  26120 S   0.0   0.3   0:00.44 openshift-tuned

# Debug worker node
sh-4.4# top -p 2456
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2456 root      20   0 1496068  44880  25616 S   0.0   0.3   0:00.44 openshift-tuned

# %CPU = 0