Bug 1775450

Summary: node tuning operator lacks "scheduler" plug-in
Product: OpenShift Container Platform Reporter: Andrew Theurer <atheurer>
Component: Node Tuning OperatorAssignee: Jiří Mencák <jmencak>
Status: CLOSED ERRATA QA Contact: Simon <skordas>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: fsimonce, mtosatti, sejug, yquinn, zkosic
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:12:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Theurer 2019-11-22 01:41:37 UTC
Description of problem:

Some tuned profiles like cpu-partitioning and realtime require the use of the scheduler plug-in to do things like migrate processes.  We need this to have equivalent capability on OCP with RHCOS.  An example can be found here under [scheduler]: https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf

Comment 3 Marcelo Tosatti 2019-12-06 17:23:31 UTC
(In reply to Andrew Theurer from comment #0)
> Description of problem:
> 
> Some tuned profiles like cpu-partitioning and realtime require the use of
> the scheduler plug-in to do things like migrate processes.  We need this to
> have equivalent capability on OCP with RHCOS.  An example can be found here
> under [scheduler]:
> https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-
> partitioning/tuned.conf

Andrew, from what i understand you are requesting this functionality 
to be duplicated in the node tuning operator.

Please do not replicate this functionality anywhere and use Tuned
instead.

<lclaudio> marcelo: there are other configurations taking place. Some I saw on the rhcos kernel command line, to deal with cstates, pstates. But tuned (the realtime and nfv) profiles do more. rtentsk, cpu_partials, ...

Tuned is:

1) Centralized on a single location.

2) Covers other realtime products (many issues on realtime-virtual-host 
and plain realtime are common, so once a problem is discovered, its
fix is enabled in Tuned and everyone is happy).

3) ps_blacklist=.*pmd.*;.*PMD.*;^DPDK;.*qemu-kvm.*
This is clearly not acceptable.

The sequence IMO should be:

1) Isolate host CPUs with Tuned.
2) Launch container with proper settings (cpumask, or move it into cpuset).

Comment 5 Jiří Mencák 2020-02-06 13:45:24 UTC
Andrew, could you be more specific what is not supported wrt. to the [scheduler] plugin in NTO?

Comment 7 Jiří Mencák 2020-04-21 16:56:20 UTC
The [scheduler] plugin actually always worked in NTO.  The real BZ likely should have been about passing variables to parent profiles, which is now fixed as of 4.4.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-04-20-224655   True        False         10h     Cluster version is 4.4.0-0.nightly-2020-04-20-224655

$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-133-210.eu-west-1.compute.internal   Ready    master   5h23m   v1.17.1
ip-10-0-143-108.eu-west-1.compute.internal   Ready    worker   5h13m   v1.17.1
ip-10-0-150-5.eu-west-1.compute.internal     Ready    master   5h23m   v1.17.1
ip-10-0-159-45.eu-west-1.compute.internal    Ready    worker   5h14m   v1.17.1
ip-10-0-163-172.eu-west-1.compute.internal   Ready    master   5h23m   v1.17.1

$ node=ip-10-0-143-108.eu-west-1.compute.internal

$ oc label node $node tuned.openshift.io/scheduler-isolated-cores=
node/ip-10-0-143-108.eu-west-1.compute.internal labeled

$ oc get pods -o wide|grep $node
tuned-tsqkf                                     1/1     Running   0          5h21m   10.0.143.108   ip-10-0-143-108.eu-west-1.compute.internal   <none>           <none>

$ oc rsh tuned-tsqkf
sh-4.2# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Stepping:              4
CPU MHz:               3100.025
BogoMIPS:              5000.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              33792K
NUMA node0 CPU(s):     0,1
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke

sh-4.2# cat /proc/irq/default_smp_affinity
3

sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   3
Cpus_allowed_list:      0-1

sh-4.2# taskset -p 2 `pgrep crio`
pid 1248's current affinity mask: 3
pid 1248's new affinity mask: 2

sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1248   1

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: scheduler-isolated-cores
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Test the [scheduler] plugin, isolated_cores
      include=openshift-node
      [scheduler]
      # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7
      isolated_cores=1
    name: scheduler-isolated-cores

  recommend:
  - match:
    - label: tuned.openshift.io/scheduler-isolated-cores
    priority: 20
    profile: scheduler-isolated-cores
EOF

sh-4.2# cat /proc/irq/default_smp_affinity
1

sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   1
Cpus_allowed_list:      0

sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1248   0

The crio process was moved by tuned to processor 0.
Let's try the [scheduler] blacklist functionality:

$ oc delete tuned/scheduler-isolated-cores

sh-4.2# cat /proc/irq/default_smp_affinity
3

sh-4.2# taskset -p 2 `pgrep crio`
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1248   1

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: scheduler-isolated-cores
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Test the [scheduler] plugin, isolated_cores
      include=openshift-node
      [scheduler]
      # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7
      isolated_cores=1
      ps_blacklist=crio.*
    name: scheduler-isolated-cores

  recommend:
  - match:
    - label: tuned.openshift.io/scheduler-isolated-cores
    priority: 20
    profile: scheduler-isolated-cores
EOF

sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   2
Cpus_allowed_list:      1

sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1248   1

Tuned didn't touch the affinity of the crio process as it was blacklisted and the process remained on processor 1.

Comment 8 Simon 2020-06-10 12:54:16 UTC
$ oc rsh tuned-l8nv5
sh-4.2# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping:              0
CPU MHz:               2300.000
BogoMIPS:              4600.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities
sh-4.2# cat /proc/irq/default_smp_affinity
f
sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   f
Cpus_allowed_list:      0-3
sh-4.2# taskset -p 2 `pgrep crio`
pid 1492's current affinity mask: f
pid 1492's new affinity mask: 2
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1492   2

oc create -f isolated.yaml

sh-4.2# cat /proc/irq/default_smp_affinity
d
sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   d
Cpus_allowed_list:      0,2-3
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1492   3

oc delete tuned/scheduler-isolated-cores

sh-4.2# cat /proc/irq/default_smp_affinity
f
sh-4.2# taskset -p 2 `pgrep crio`
pid 1492's current affinity mask: 2
pid 1492's new affinity mask: 2
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1492   3

oc create -f blacklist.yaml

sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   2
Cpus_allowed_list:      1
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1492   1


Cluster version: 4.5.0-0.nightly-2020-06-08-031520

Comment 11 errata-xmlrpc 2020-07-13 17:12:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 12 Red Hat Bugzilla 2023-09-14 05:47:24 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days