Bug 2113925

Summary: Extend TuneD API to allow hotplugging/deplugging devices from the plugin instances at runtime
Product: Red Hat Enterprise Linux 9 Reporter: Jaroslav Škarvada <jskarvad>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Robin Hack <rhack>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.1CC: akamra, bwensley, jeder, jmario, jmencak, jskarvad, msivak, sjanderk, william.caban
Target Milestone: rcKeywords: Patch, Triaged, Upstream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tuned-2.20.0-0.1.rc1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 08:26:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Deadline: 2023-02-27   

Description Jaroslav Škarvada 2022-08-02 10:42:07 UTC
Description of problem:
E.g. consider the following TuneD profile:
[cpus_perf]
type=cpu
devices=cpu1, cpu2
governor=performance

[cpus_idle]
type=cpu
devices=cpu3, cpu4
governor=ondemand

Extend the runtime API by e.g. the following methods:
instance_remove_device(instance, device)
instance_add_device(instance, device)

And then by e.g. running the following:
instance_remove_device("cpus_idle", "cpu4")
instance_add_device("cpus_perf", "cpu4")

This should result in cpu1, cpu2, cpu4 having "performance" governor and cpu3 havign "ondemand" governor.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Check the API (e.g. on D-Bus)
2.
3.

Actual results:
No methods for controlling hotpluggin/deplugging of the devices to/from the instances

Expected results:
Methods for controlling hotpluggin/deplugging of the devices to/from the instances

Additional info:
Consider bug 2113900, the D-Bus and socket APIs needs to be consistent.

Also the udev_device_regex regular expressios needs to be supported. This could be done by providing second sets of methods, or e.g. boolean flag to the methods.

Comment 2 Joe Mario 2022-10-25 18:21:17 UTC
(In reply to Jaroslav Škarvada from comment #0)
> Description of problem:
> E.g. consider the following TuneD profile:
> [cpus_perf]
> devices=cpu1, cpu2
> governor=performance
> 
> [cpus_idle]
> devices=cpu3, cpu4
> governor=ondemand
> 
> Extend the runtime API by e.g. the following methods...

Hi Jaroslav:
Just an fyi that I tried both the above syntax, and this syntax:

 [cpu_group1]
 type=cpu
 devices=0
 governor=powersave

and this syntax:

 [cpu_group1]
 devices=cpu0
 governor=powersave

I couldn't get it to work.  
The /var/log/tuned/tuned.log file reported:

> 2022-10-25 13:58:54,892 WARNING  tuned.plugins.base: instance cpu_group1: no matching devices available
> 2022-10-25 13:58:54,892 INFO     tuned.plugins.plugin_cpu: Latency settings from non-first CPU plugin instance 'cpu_group1' will be ignored.

This is on a recent RHEL-8 (8.6) running tuned-adm 2.18.0.

I'm curious that between the comments you made about the syntax in [1] or the syntax that you noted above, I thought it should work on RHEL-8.
Do you know if it's supported yet?

Thank you.
Joe

[1] https://docs.google.com/document/d/1mAVxyz6uzm5QMfv8xozCY7pqeuCwvwmjfgvzWdBgev4/edit#

Comment 4 Jaroslav Škarvada 2022-11-09 12:42:21 UTC
Upstream PR:
https://github.com/redhat-performance/tuned/pull/477

Comment 6 Jiří Mencák 2022-12-08 14:25:55 UTC
Hi Joe,

I stumbled upon your question when reviewing/testing https://github.com/redhat-performance/tuned/pull/477
I had the same issue as you.

(In reply to Joe Mario from comment #2)
> Hi Jaroslav:
> Just an fyi that I tried both the above syntax, and this syntax:
> 
>  [cpu_group1]
>  type=cpu
>  devices=0
>  governor=powersave
> 
> and this syntax:
> 
>  [cpu_group1]
>  devices=cpu0
>  governor=powersave
> 

I believe the correct syntax is

[cpu_group1]
type=cpu
devices=cpu0
governor=powersave

That one worked for me and you were close.  Perhaps Jaroslav can edit this BZ to add "type=cpu"
to the [cpus_perf] and [cpus_idle] sections?

Comment 7 Jaroslav Škarvada 2022-12-13 12:44:48 UTC
(In reply to Jiří Mencák from comment #6)
> Hi Joe,
> 
> I stumbled upon your question when reviewing/testing
> https://github.com/redhat-performance/tuned/pull/477
> I had the same issue as you.
> 
> (In reply to Joe Mario from comment #2)
> > Hi Jaroslav:
> > Just an fyi that I tried both the above syntax, and this syntax:
> > 
> >  [cpu_group1]
> >  type=cpu
> >  devices=0
> >  governor=powersave
> > 
> > and this syntax:
> > 
> >  [cpu_group1]
> >  devices=cpu0
> >  governor=powersave
> > 
> 
> I believe the correct syntax is
> 
> [cpu_group1]
> type=cpu
> devices=cpu0
> governor=powersave
> 
> That one worked for me and you were close.  Perhaps Jaroslav can edit this
> BZ to add "type=cpu"
> to the [cpus_perf] and [cpus_idle] sections?

Yes, it was a typo. I have already edited upstream PR few days ago and now I also fixed the BZ comment.

Comment 12 Joe Mario 2023-02-16 14:26:42 UTC
Here are some latency test results using the new cpu plugins.
It shows a use case that preserves performance and can save power.

First, my system:
 Two 32-cpu Cascade Lake systems, hyperthreads off.
 Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
 Direct 100Gb connection between them.
 I did not use kernel bypass for network traffic.

Second: my tuned.conf file:
    #
    # tuned configuration
    #
    
    [main]
    summary=Testing new TuneD features
    include=network-latency
    
    # Temporary workaround to override inherited [cpu] plugin.
    [cpu]
    enabled=0
    
    # Set two cpus to high performance
    [cpus_performance]
    type=cpu
    priority=-10
    devices=${f:cpulist2devs:28,30}
    governor=performance
    energy_perf_bias=performance
    pm_qos_resume_latency_us=2
    #
    # Reset system-wide cstate to C6 and min_perf_pct
    force_latency=cstate.id_no_zero:3|170
    min_perf_pct=100
    
    [cpus_powersave]
    type=cpu
    devices=${f:cpulist2devs:0-27,29,31}
    priority=-10
    governor=powersave
    energy_perf_bias=power

I used the qperf network tool to test the round trip time between the two systems.
    qperf-0.4.9-22.el9.x86_64 : Measure socket and RDMA performance

The command lines used were:
  On the client:  # numactl -m0 -C28,30 qperf 100.100.100.156 -m 256 -t 60  udp_lat
  On the server:  # numactl -m0 -C28,30 qperf

Here are the results of running qperf with three different TuneD profiles.

TuneD Profile            qperf 256b   cpu-wattage  ram-wattage cpu-wattage  ram-wattage
                         round trip   during test  during test idle system  idle-system
----------------------   ----------   -----------  ----------- -----------  ----------
throughput-performance     7.2 us       73 watts    22 watts     29 watts    15 watts
network-latency            5.5 us       95 watts    27 watts     94 watts    26 watts
test_cpu-plugin            5.6 us       73 watts    22 watts     69 watts    21 watts

Notes:  
a) The throughput-performance profile does not set a cstate, letting cstates float to C6.

b) The network-latency profiles sets all cpus to cstate C1, which limits turbo frequencies.

c) The test_cpu-plugin profile is the one shown above, which only sets C1 on cpus 28 & 30.

d) The system we used was small, only 32 cpu cores.  The difference in "idle" power consumption between setting C1 vs C6 was only 30% higher with C1.  On other systems with higher cpu counts, we see that "idle" power difference between C1 vs C6 rise to over 100% higher.

e) Kernel bypass for network traffic was not used.  If it was, the qperf times would be down in the 1.5 us range, with similar scaling.

Conclusion:
While the above test scenario could be called fabricated, it does represent a number of low-latency workloads.  In those environments, one or two cpus are needed to run as fast as possible to "grab packets off the wire", while the remaining "housekeeping" cpus don't need to be at full power to process the data.  
This new cpu plugin does provide the flexibility for systems to partition their cpus into high and low power cpus, to both save power and to take advantage of higher turbo frequencies.

Comment 22 errata-xmlrpc 2023-05-09 08:26:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (tuned bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2585