Bug 1775450 - node tuning operator lacks "scheduler" plug-in
Summary: node tuning operator lacks "scheduler" plug-in
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Jiří Mencák
QA Contact: Simon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-22 01:41 UTC by Andrew Theurer
Modified: 2023-09-14 05:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:12:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:12:31 UTC

Description Andrew Theurer 2019-11-22 01:41:37 UTC
Description of problem:

Some tuned profiles like cpu-partitioning and realtime require the use of the scheduler plug-in to do things like migrate processes.  We need this to have equivalent capability on OCP with RHCOS.  An example can be found here under [scheduler]: https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf

Comment 3 Marcelo Tosatti 2019-12-06 17:23:31 UTC
(In reply to Andrew Theurer from comment #0)
> Description of problem:
> 
> Some tuned profiles like cpu-partitioning and realtime require the use of
> the scheduler plug-in to do things like migrate processes.  We need this to
> have equivalent capability on OCP with RHCOS.  An example can be found here
> under [scheduler]:
> https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-
> partitioning/tuned.conf

Andrew, from what i understand you are requesting this functionality 
to be duplicated in the node tuning operator.

Please do not replicate this functionality anywhere and use Tuned
instead.

<lclaudio> marcelo: there are other configurations taking place. Some I saw on the rhcos kernel command line, to deal with cstates, pstates. But tuned (the realtime and nfv) profiles do more. rtentsk, cpu_partials, ...

Tuned is:

1) Centralized on a single location.

2) Covers other realtime products (many issues on realtime-virtual-host 
and plain realtime are common, so once a problem is discovered, its
fix is enabled in Tuned and everyone is happy).

3) ps_blacklist=.*pmd.*;.*PMD.*;^DPDK;.*qemu-kvm.*
This is clearly not acceptable.

The sequence IMO should be:

1) Isolate host CPUs with Tuned.
2) Launch container with proper settings (cpumask, or move it into cpuset).

Comment 5 Jiří Mencák 2020-02-06 13:45:24 UTC
Andrew, could you be more specific what is not supported wrt. to the [scheduler] plugin in NTO?

Comment 7 Jiří Mencák 2020-04-21 16:56:20 UTC
The [scheduler] plugin actually always worked in NTO.  The real BZ likely should have been about passing variables to parent profiles, which is now fixed as of 4.4.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-04-20-224655   True        False         10h     Cluster version is 4.4.0-0.nightly-2020-04-20-224655

$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-133-210.eu-west-1.compute.internal   Ready    master   5h23m   v1.17.1
ip-10-0-143-108.eu-west-1.compute.internal   Ready    worker   5h13m   v1.17.1
ip-10-0-150-5.eu-west-1.compute.internal     Ready    master   5h23m   v1.17.1
ip-10-0-159-45.eu-west-1.compute.internal    Ready    worker   5h14m   v1.17.1
ip-10-0-163-172.eu-west-1.compute.internal   Ready    master   5h23m   v1.17.1

$ node=ip-10-0-143-108.eu-west-1.compute.internal

$ oc label node $node tuned.openshift.io/scheduler-isolated-cores=
node/ip-10-0-143-108.eu-west-1.compute.internal labeled

$ oc get pods -o wide|grep $node
tuned-tsqkf                                     1/1     Running   0          5h21m   10.0.143.108   ip-10-0-143-108.eu-west-1.compute.internal   <none>           <none>

$ oc rsh tuned-tsqkf
sh-4.2# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Stepping:              4
CPU MHz:               3100.025
BogoMIPS:              5000.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              33792K
NUMA node0 CPU(s):     0,1
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke

sh-4.2# cat /proc/irq/default_smp_affinity
3

sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   3
Cpus_allowed_list:      0-1

sh-4.2# taskset -p 2 `pgrep crio`
pid 1248's current affinity mask: 3
pid 1248's new affinity mask: 2

sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1248   1

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: scheduler-isolated-cores
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Test the [scheduler] plugin, isolated_cores
      include=openshift-node
      [scheduler]
      # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7
      isolated_cores=1
    name: scheduler-isolated-cores

  recommend:
  - match:
    - label: tuned.openshift.io/scheduler-isolated-cores
    priority: 20
    profile: scheduler-isolated-cores
EOF

sh-4.2# cat /proc/irq/default_smp_affinity
1

sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   1
Cpus_allowed_list:      0

sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1248   0

The crio process was moved by tuned to processor 0.
Let's try the [scheduler] blacklist functionality:

$ oc delete tuned/scheduler-isolated-cores

sh-4.2# cat /proc/irq/default_smp_affinity
3

sh-4.2# taskset -p 2 `pgrep crio`
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1248   1

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: scheduler-isolated-cores
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Test the [scheduler] plugin, isolated_cores
      include=openshift-node
      [scheduler]
      # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7
      isolated_cores=1
      ps_blacklist=crio.*
    name: scheduler-isolated-cores

  recommend:
  - match:
    - label: tuned.openshift.io/scheduler-isolated-cores
    priority: 20
    profile: scheduler-isolated-cores
EOF

sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   2
Cpus_allowed_list:      1

sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1248   1

Tuned didn't touch the affinity of the crio process as it was blacklisted and the process remained on processor 1.

Comment 8 Simon 2020-06-10 12:54:16 UTC
$ oc rsh tuned-l8nv5
sh-4.2# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping:              0
CPU MHz:               2300.000
BogoMIPS:              4600.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities
sh-4.2# cat /proc/irq/default_smp_affinity
f
sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   f
Cpus_allowed_list:      0-3
sh-4.2# taskset -p 2 `pgrep crio`
pid 1492's current affinity mask: f
pid 1492's new affinity mask: 2
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1492   2

oc create -f isolated.yaml

sh-4.2# cat /proc/irq/default_smp_affinity
d
sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   d
Cpus_allowed_list:      0,2-3
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1492   3

oc delete tuned/scheduler-isolated-cores

sh-4.2# cat /proc/irq/default_smp_affinity
f
sh-4.2# taskset -p 2 `pgrep crio`
pid 1492's current affinity mask: 2
pid 1492's new affinity mask: 2
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1492   3

oc create -f blacklist.yaml

sh-4.2# grep ^Cpu /proc/`pgrep crio`/status
Cpus_allowed:   2
Cpus_allowed_list:      1
sh-4.2# ps -q `pgrep crio` -eo args,pid,psr
COMMAND                         PID PSR
/usr/bin/crio --enable-metr    1492   1


Cluster version: 4.5.0-0.nightly-2020-06-08-031520

Comment 11 errata-xmlrpc 2020-07-13 17:12:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 12 Red Hat Bugzilla 2023-09-14 05:47:24 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.