Description of problem: Some tuned profiles like cpu-partitioning and realtime require the use of the scheduler plug-in to do things like migrate processes. We need this to have equivalent capability on OCP with RHCOS. An example can be found here under [scheduler]: https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf
(In reply to Andrew Theurer from comment #0) > Description of problem: > > Some tuned profiles like cpu-partitioning and realtime require the use of > the scheduler plug-in to do things like migrate processes. We need this to > have equivalent capability on OCP with RHCOS. An example can be found here > under [scheduler]: > https://github.com/redhat-performance/tuned/blob/master/profiles/cpu- > partitioning/tuned.conf Andrew, from what i understand you are requesting this functionality to be duplicated in the node tuning operator. Please do not replicate this functionality anywhere and use Tuned instead. <lclaudio> marcelo: there are other configurations taking place. Some I saw on the rhcos kernel command line, to deal with cstates, pstates. But tuned (the realtime and nfv) profiles do more. rtentsk, cpu_partials, ... Tuned is: 1) Centralized on a single location. 2) Covers other realtime products (many issues on realtime-virtual-host and plain realtime are common, so once a problem is discovered, its fix is enabled in Tuned and everyone is happy). 3) ps_blacklist=.*pmd.*;.*PMD.*;^DPDK;.*qemu-kvm.* This is clearly not acceptable. The sequence IMO should be: 1) Isolate host CPUs with Tuned. 2) Launch container with proper settings (cpumask, or move it into cpuset).
Andrew, could you be more specific what is not supported wrt. to the [scheduler] plugin in NTO?
The [scheduler] plugin actually always worked in NTO. The real BZ likely should have been about passing variables to parent profiles, which is now fixed as of 4.4. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-04-20-224655 True False 10h Cluster version is 4.4.0-0.nightly-2020-04-20-224655 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-133-210.eu-west-1.compute.internal Ready master 5h23m v1.17.1 ip-10-0-143-108.eu-west-1.compute.internal Ready worker 5h13m v1.17.1 ip-10-0-150-5.eu-west-1.compute.internal Ready master 5h23m v1.17.1 ip-10-0-159-45.eu-west-1.compute.internal Ready worker 5h14m v1.17.1 ip-10-0-163-172.eu-west-1.compute.internal Ready master 5h23m v1.17.1 $ node=ip-10-0-143-108.eu-west-1.compute.internal $ oc label node $node tuned.openshift.io/scheduler-isolated-cores= node/ip-10-0-143-108.eu-west-1.compute.internal labeled $ oc get pods -o wide|grep $node tuned-tsqkf 1/1 Running 0 5h21m 10.0.143.108 ip-10-0-143-108.eu-west-1.compute.internal <none> <none> $ oc rsh tuned-tsqkf sh-4.2# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz Stepping: 4 CPU MHz: 3100.025 BogoMIPS: 5000.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 33792K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke sh-4.2# cat /proc/irq/default_smp_affinity 3 sh-4.2# grep ^Cpu /proc/`pgrep crio`/status Cpus_allowed: 3 Cpus_allowed_list: 0-1 sh-4.2# taskset -p 2 `pgrep crio` pid 1248's current affinity mask: 3 pid 1248's new affinity mask: 2 sh-4.2# ps -q `pgrep crio` -eo args,pid,psr COMMAND PID PSR /usr/bin/crio --enable-metr 1248 1 $ oc create -f- <<EOF apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: scheduler-isolated-cores namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Test the [scheduler] plugin, isolated_cores include=openshift-node [scheduler] # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7 isolated_cores=1 name: scheduler-isolated-cores recommend: - match: - label: tuned.openshift.io/scheduler-isolated-cores priority: 20 profile: scheduler-isolated-cores EOF sh-4.2# cat /proc/irq/default_smp_affinity 1 sh-4.2# grep ^Cpu /proc/`pgrep crio`/status Cpus_allowed: 1 Cpus_allowed_list: 0 sh-4.2# ps -q `pgrep crio` -eo args,pid,psr COMMAND PID PSR /usr/bin/crio --enable-metr 1248 0 The crio process was moved by tuned to processor 0. Let's try the [scheduler] blacklist functionality: $ oc delete tuned/scheduler-isolated-cores sh-4.2# cat /proc/irq/default_smp_affinity 3 sh-4.2# taskset -p 2 `pgrep crio` sh-4.2# ps -q `pgrep crio` -eo args,pid,psr COMMAND PID PSR /usr/bin/crio --enable-metr 1248 1 $ oc create -f- <<EOF apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: scheduler-isolated-cores namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Test the [scheduler] plugin, isolated_cores include=openshift-node [scheduler] # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7 isolated_cores=1 ps_blacklist=crio.* name: scheduler-isolated-cores recommend: - match: - label: tuned.openshift.io/scheduler-isolated-cores priority: 20 profile: scheduler-isolated-cores EOF sh-4.2# grep ^Cpu /proc/`pgrep crio`/status Cpus_allowed: 2 Cpus_allowed_list: 1 sh-4.2# ps -q `pgrep crio` -eo args,pid,psr COMMAND PID PSR /usr/bin/crio --enable-metr 1248 1 Tuned didn't touch the affinity of the crio process as it was blacklisted and the process remained on processor 1.
$ oc rsh tuned-l8nv5 sh-4.2# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU @ 2.30GHz Stepping: 0 CPU MHz: 2300.000 BogoMIPS: 4600.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities sh-4.2# cat /proc/irq/default_smp_affinity f sh-4.2# grep ^Cpu /proc/`pgrep crio`/status Cpus_allowed: f Cpus_allowed_list: 0-3 sh-4.2# taskset -p 2 `pgrep crio` pid 1492's current affinity mask: f pid 1492's new affinity mask: 2 sh-4.2# ps -q `pgrep crio` -eo args,pid,psr COMMAND PID PSR /usr/bin/crio --enable-metr 1492 2 oc create -f isolated.yaml sh-4.2# cat /proc/irq/default_smp_affinity d sh-4.2# grep ^Cpu /proc/`pgrep crio`/status Cpus_allowed: d Cpus_allowed_list: 0,2-3 sh-4.2# ps -q `pgrep crio` -eo args,pid,psr COMMAND PID PSR /usr/bin/crio --enable-metr 1492 3 oc delete tuned/scheduler-isolated-cores sh-4.2# cat /proc/irq/default_smp_affinity f sh-4.2# taskset -p 2 `pgrep crio` pid 1492's current affinity mask: 2 pid 1492's new affinity mask: 2 sh-4.2# ps -q `pgrep crio` -eo args,pid,psr COMMAND PID PSR /usr/bin/crio --enable-metr 1492 3 oc create -f blacklist.yaml sh-4.2# grep ^Cpu /proc/`pgrep crio`/status Cpus_allowed: 2 Cpus_allowed_list: 1 sh-4.2# ps -q `pgrep crio` -eo args,pid,psr COMMAND PID PSR /usr/bin/crio --enable-metr 1492 1 Cluster version: 4.5.0-0.nightly-2020-06-08-031520
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days