Bug 1737628 - [RHEL8] Tuned setting C-state0 instead of C-state1
Summary: [RHEL8] Tuned setting C-state0 instead of C-state1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: tuned
Version: 8.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 8.0
Assignee: Jaroslav Škarvada
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-05 21:33 UTC by Joe Mario
Modified: 2020-11-14 06:44 UTC (History)
13 users (show)

Fixed In Version: tuned-2.12.0-3.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-05 22:31:22 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
Exit latencies for given c-states on Intel cpus supported in RHEL-8 (5.09 KB, text/plain)
2019-08-05 21:59 UTC, Joe Mario
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github redhat-performance/tuned/commit/0ec40e036019c4c062d76a31d676565a09c615dd 0 None None None 2020-10-06 07:35:38 UTC
Github redhat-performance/tuned/commit/e40b50a49851c8a930d2983d1b30eab5db2d3176 0 None None None 2020-10-06 07:35:38 UTC
Red Hat Product Errata RHBA-2019:3633 0 None None None 2019-11-05 22:31:25 UTC

Description Joe Mario 2019-08-05 21:33:33 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Joe Mario 2019-08-05 21:59:44 UTC
Created attachment 1600778 [details]
Exit latencies for given c-states on Intel cpus supported in RHEL-8

Comment 2 Joe Mario 2019-08-05 22:12:22 UTC
Problem statement:

Tuned lets users specify desired c-states via the "forced_latency" knob in the tuned conf files.  This just happened to work fine on RHEL-7 and RHEL-6 when "forced_latency=1" was used to set cstate1 (C1), but it no longer works on RHEL-8 due to a change in the kernel.   That "forced_latency=1" setting now causes C0 to be set.

The RHEL-8 kernel change is an intentional desired change.  (See https://bugzilla.redhat.com/show_bug.cgi?id=1737276 ).  It will remain and will not be backported to RHEL-7.

We need to modify tuned and find a reliable way for users to specify desired cstate values.

We could use the /sys area, for example from a Broadwell:

   /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
   /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0

   /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1
   /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2

   /sys/devices/system/cpu/cpu0/cpuidle/state2/name:C1E
   /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10

   /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3
   /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:40

   /sys/devices/system/cpu/cpu0/cpuidle/state4/name:C6
   /sys/devices/system/cpu/cpu0/cpuidle/state4/latency:133

This should work on Intel and AMD cpus.  I have not looked at other arches.
Any ideas for how this can best be handled?

We should also make fixing this a priority on RHEL-8.  Right now users of various tuned profiles have their cpus running in C0, which means no turbo mode, higher power consumption, and hotter cpus (which get throttled down to cool them off).

Comment 3 Joe Mario 2019-08-06 22:14:40 UTC
Hi Jaroslav:
Do you have any thoughts on moving this BZ forward?

Given we have customers who are already using the "forced_latency=<n>" in their private tuned profiles, how about the following?

 a) We leave the existing "forced_latency=<n>" in place.

 b) We create a new cstate interface.  For example:  "cstate=<n>"
    Tuned then reads the appropriate /sys/devices/system/cpu/cpu0/cpuidle/state<n>/latency file to figure out what latency to specify.

 c) We then switch all the Red Hat tuned profiles that use "forced_latency=<n>" to use "cstate=<n>".  It shouldn't be many.

 d) This would only be for RHEL-8.

This is pretty important to resolve this quickly for RHEL-8.

Thoughts?

Thank you.
Joe

Comment 4 Jaroslav Škarvada 2019-08-07 10:06:41 UTC
(In reply to Joe Mario from comment #3)
> Hi Jaroslav:
> Do you have any thoughts on moving this BZ forward?
> 
> Given we have customers who are already using the "forced_latency=<n>" in
> their private tuned profiles, how about the following?
> 
>  a) We leave the existing "forced_latency=<n>" in place.
> 
>  b) We create a new cstate interface.  For example:  "cstate=<n>"
>     Tuned then reads the appropriate
> /sys/devices/system/cpu/cpu0/cpuidle/state<n>/latency file to figure out
> what latency to specify.
> 
>  c) We then switch all the Red Hat tuned profiles that use
> "forced_latency=<n>" to use "cstate=<n>".  It shouldn't be many.
> 
>  d) This would only be for RHEL-8.
> 
> This is pretty important to resolve this quickly for RHEL-8.
> 
> Thoughts?
> 
> Thank you.
> Joe

We have been thinking about it in the past. We could also extend the syntax to allow C-state to be entered as a force_latency parameter, e.g.:
force_latency=10 # 10 us
...
force_latency=C1 # C1 or less

Comment 5 Jaroslav Škarvada 2019-08-07 10:10:42 UTC
(In reply to Jaroslav Škarvada from comment #4)
> (In reply to Joe Mario from comment #3)
> > Hi Jaroslav:
> > Do you have any thoughts on moving this BZ forward?
> > 
> > Given we have customers who are already using the "forced_latency=<n>" in
> > their private tuned profiles, how about the following?
> > 
> >  a) We leave the existing "forced_latency=<n>" in place.
> > 
> >  b) We create a new cstate interface.  For example:  "cstate=<n>"
> >     Tuned then reads the appropriate
> > /sys/devices/system/cpu/cpu0/cpuidle/state<n>/latency file to figure out
> > what latency to specify.
> > 
> >  c) We then switch all the Red Hat tuned profiles that use
> > "forced_latency=<n>" to use "cstate=<n>".  It shouldn't be many.
> > 
> >  d) This would only be for RHEL-8.
> > 
> > This is pretty important to resolve this quickly for RHEL-8.
> > 
> > Thoughts?
> > 
> > Thank you.
> > Joe
> 
> We have been thinking about it in the past. We could also extend the syntax
> to allow C-state to be entered as a force_latency parameter, e.g.:
> force_latency=10 # 10 us
> ...
> force_latency=C1 # C1 or less

force_latency=C1 # for the state named C1
force_latency=state1 # for what kernel thinks is state 1

Comment 6 Joe Mario 2019-08-07 11:39:46 UTC
(In reply to Jaroslav Škarvada from comment #5)
<snip>
> >
> > We have been thinking about it in the past. We could also extend the syntax
> > to allow C-state to be entered as a force_latency parameter, e.g.:
> > force_latency=10 # 10 us
> > ...
> > force_latency=C1 # C1 or less
> 
> force_latency=C1 # for the state named C1
> force_latency=state1 # for what kernel thinks is state 1

I Jaroslav:
I like that idea.  

If I understand correctly, there would be three options, and tuned would parse it to determine which one is being used:
E.g:
  force_latency=10
  force_latency=C1
  force_latency=state1

This would be great.  Thank you!

Comment 9 Jaroslav Škarvada 2019-08-14 14:50:04 UTC
Maximal latency can be now specified multiple ways:
- directly in usec (this is the same as before), e.g. for 10 us:
  force_latency = 10
- as an ID of maximal cstate allowed, e.g. for the kernel state1:
  force_latency = cstate.id:1
- as a name (case sensitive) of maximal cstate allowed, e.g. for the state named C1:
  force_latency = cstate.name:C1

It is also possible to specify multiple fallback values separated by '|', e.g.:
  force_latency = cstate.name:C6|cstate.id:4|10

This will try to obtain latency of cstate named C6, if it fails (e.g.
there is no such cstate), it will try kernel state4 and if it also fails
it finally fallbacks to 10 us.

Comment 10 Jaroslav Škarvada 2019-08-14 14:52:41 UTC
The upstream commit also changed force_latency settings of latency-performance profile to:
  force_latency=cstate.id:1|1

I.e. it tries kernel state1 and fallbacks to 1 us. We could use 'cstate.name:C1' to explicitly specify 'C1', but I think using kernel ID 'state1' is more generic - it means second C-state and doesn't care about it's name.

Comment 11 Jaroslav Škarvada 2019-08-14 14:59:59 UTC
Tuned obtains the latency information from the CPU0.

Comment 12 Joe Mario 2019-08-14 15:33:25 UTC
Thanks Jaroslav:
This is great.  Thank you for getting to it so quickly.

Clark and Luiz:
The realtime/tuned.conf file includes from network-latency, which Jaroslav is fixing as part of this BZ.  


If anyone knows any other profiles that would need to be explicitly changed from "force_latency=1" to "force_latency=cstate.id:1", please holler.

Jaroslav:
Is there anything you need from me in order to get this into RHEL-8.1 ?

Joe

Comment 13 Jaroslav Škarvada 2019-08-14 15:49:31 UTC
(In reply to Joe Mario from comment #12)
> If anyone knows any other profiles that would need to be explicitly changed
> from "force_latency=1" to "force_latency=cstate.id:1", please holler.
> 
Regarding upstream Tuned profiles there are also:
sap-hana - setting force_latency directly to 70 us
virtual-host - setting force_latency directly to 70 us

As nobody complained and I cannot directly match it with specific C-state (it's probably C3 max and not C4 and higher states - so it should work as it is) I didn't touch it.

> Jaroslav:
> Is there anything you need from me in order to get this into RHEL-8.1 ?
> 
I think we are setup. I am now writing Beaker test. The errata will be created soon.

Comment 14 Joe Mario 2019-08-14 16:19:16 UTC
Hi Jaroslav:
I remember when Dave Dumas (cc'd) and I worked with SAP when they said their testing showed they got better hana performance with C3 than with C1.   They identified a force_latency value of 70 as getting them C3.  If you can change that to cstate.id:3, that would be great.

I suspect virtual-host wanted cstate3 as well.  I've cc'd Andrew Theurer to see if he can confirm.
Thank you.
Joe

Comment 15 Jaroslav Škarvada 2019-08-16 16:06:03 UTC
(In reply to Joe Mario from comment #14)
> Hi Jaroslav:
> I remember when Dave Dumas (cc'd) and I worked with SAP when they said their
> testing showed they got better hana performance with C3 than with C1.   They
> identified a force_latency value of 70 as getting them C3.  If you can
> change that to cstate.id:3, that would be great.
> 
> I suspect virtual-host wanted cstate3 as well.  I've cc'd Andrew Theurer to
> see if he can confirm.
> Thank you.
> Joe

NP, I changed it both to:

force_latency=cstate.id:3|70

Regarding the virtual-host, there is written in the comment:
# Setting C3 state sleep mode/power savings

So it seems C3 was intended.

Comment 22 Joe Mario 2019-09-10 12:42:45 UTC
Hi Jaroslav:
I wonder if you have a minor bug in the recent fix.  It's not causing any problem, but it might.

Looking at a RHEL-8.1 system, I see:
   # grep force_latency /lib/tuned/*/tuned.conf
   /lib/tuned/latency-performance/tuned.conf:force_latency=cstate.id:1|1
   /lib/tuned/virtual-host/tuned.conf:force_latency=cstate.id:3|70

The force_latency values for virtual-host look fine.
But shouldn't the force_latency values for latency-performance be "cstate.id:1|2" instead of "cstate.id:1|1"?

Thank you.
Joe

Comment 23 Jaroslav Škarvada 2019-09-23 09:17:19 UTC
(In reply to Joe Mario from comment #22)
> Hi Jaroslav:
> I wonder if you have a minor bug in the recent fix.  It's not causing any
> problem, but it might.
> 
> Looking at a RHEL-8.1 system, I see:
>    # grep force_latency /lib/tuned/*/tuned.conf
>    /lib/tuned/latency-performance/tuned.conf:force_latency=cstate.id:1|1
>    /lib/tuned/virtual-host/tuned.conf:force_latency=cstate.id:3|70
> 
> The force_latency values for virtual-host look fine.
> But shouldn't the force_latency values for latency-performance be
> "cstate.id:1|2" instead of "cstate.id:1|1"?
> 
> Thank you.
> Joe

I wanted to stay backward compatible. But by looking on the table from the comment 1, it seems the worst exit latency for C1 is 3 us, shouldn't it be 3 us then? I.e.:
cstate.id:1|3"

When we come to conclusion on this I can fix it in upstream and it will get to RHEL by next rebase. I think it's not needed to fix it immediately by e.g. respin.

Comment 24 Joe Mario 2019-09-23 14:46:17 UTC
Hi Jaroslav:
I agree with you that setting the value to a 3 is better than a 2.
I also agree that a respin is not needed.

Thank you.
Joe

Comment 25 Jaroslav Škarvada 2019-09-24 07:06:24 UTC
(In reply to Joe Mario from comment #24)
> Hi Jaroslav:
> I agree with you that setting the value to a 3 is better than a 2.
> I also agree that a respin is not needed.
> 
> Thank you.
> Joe

Upstream commit:
https://github.com/redhat-performance/tuned/commit/252bd91ed0deeec5caf1d2a01c379145833707b7

Comment 27 errata-xmlrpc 2019-11-05 22:31:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3633


Note You need to log in before you can comment on or make changes to this bug.