Bug 2095829
Summary: | TuneD: Need to modify method for setting EPP/EPB on Intel processors | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Joe Mario <jmario> |
Component: | tuned | Assignee: | Jaroslav Škarvada <jskarvad> |
Status: | CLOSED ERRATA | QA Contact: | Robin Hack <rhack> |
Severity: | unspecified | Docs Contact: | Šárka Jana <sjanderk> |
Priority: | unspecified | ||
Version: | 9.1 | CC: | gfialova, jeder, jskarvad, prarit |
Target Milestone: | rc | Keywords: | Patch, Triaged, Upstream |
Target Release: | --- | Flags: | pm-rhel:
mirror+
|
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | tuned-2.20.0-0.1.rc1.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-05-09 08:26:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Joe Mario
2022-06-10 16:58:10 UTC
Hi Jaroslav:
Here's an example of an older processor (Intel Haswell) running RHEL-9 (5.14.0-8.el9.x86_64), which does not have the newer "energy_performance_available_preferences" file.
In this case where that file does not exist, continue to use the current TuneD method of setting energy_perf_bias using /usr/bin/x86_energy_perf_policy.
[root@shakperf ~]# cat t.sh
#!/bin/bash
cd /sys/devices/system/cpu
grep . cpufreq/policy0/*
grep . intel_pstate/*
grep . cpu0/power/*
[root@shakperf ~]# sh t.sh
cpufreq/policy0/affected_cpus:0
cpufreq/policy0/cpuinfo_max_freq:3600000
cpufreq/policy0/cpuinfo_min_freq:1200000
cpufreq/policy0/cpuinfo_transition_latency:20000
cpufreq/policy0/related_cpus:0
cpufreq/policy0/scaling_available_governors:conservative ondemand userspace powersave performance schedutil
cpufreq/policy0/scaling_cur_freq:2134050
cpufreq/policy0/scaling_driver:intel_cpufreq
cpufreq/policy0/scaling_governor:performance
cpufreq/policy0/scaling_max_freq:3600000
cpufreq/policy0/scaling_min_freq:3600000
cpufreq/policy0/scaling_setspeed:<unsupported>
intel_pstate/max_perf_pct:100
intel_pstate/min_perf_pct:100
intel_pstate/no_turbo:0
intel_pstate/num_pstates:25
intel_pstate/status:passive
intel_pstate/turbo_pct:53
cpu0/power/control:auto
>> cpu0/power/energy_perf_bias:0
cpu0/power/pm_qos_resume_latency_us:0
cpu0/power/runtime_active_time:0
cpu0/power/runtime_status:unsupported
cpu0/power/runtime_suspended_time:0
I'm sorry. Did you find any way to identify newer cpus and how we should behave when intel_pstate is disabled? Hi Jan: RE: Did you find any way to identify newer cpus and how we should behave when intel_pstate is disabled? There should be no need to have to identify cpus. The algorithm below from comment 0 is still valid for Intel cpus, with one temporary qualification noted below: > $DIR=/sys/devices/system/cpu/cpuX/power/ > > If ($DIR/energy_performance_available_preferences exists) { > then: > a) Check that file for available EPP preferences to verify > the desired EPP value is available. > b) Then write the desired (string) policy to > $DIR/energy_performance_preference. > } > else { /* energy_performance_available_preferences does not exist */ > a) Then write directly to $DIR/energy_perf_bias using the > x86_energy_perf_policy binary. > This should only be true on older processors. > } That last comment "should only be true on older processors" needs qualification. Currently it will also be true when intel_pstate is disabled. There should be an upstream patch that Prarit is looking for that will correct this. With that patch, the above algorithm will work correctly with or without intel_pstate. But TuneD does not need to wait for Prarit's patch. To summarize, TuneD need not identify newer cpus. It should just use the above algorithm. @prarit Please confirm the above algorithm is valid. (It should be.) Jan: Here is a more clear version of the algorithm that TuneD should use. Prarit: Please confirm. If (energy_performance_available_preferences exists) { if (the user requested preference is one of the available preferences) { then write that value to the energy_performance_preference file else return // Because the requested value is not valid. } else { /* energy_performance_available_preferences does not exist */ /* * This case should only be true on Intel cpus older than Skylake * or on Skylake & newer cpus not booted the intel_pstate driver enabled * and without an upcoming kernel patch [1]. */ if (an energy_perf_bias value was also requested) { write it directly to energy_perf_bias using the x86_energy_perf_policy binary [2] else return } [1] There's an upstream kernel patch Prarit is looking to backport that will always have the energy_performance_{preference,available_preferences} files available even if the kernel was booted with intel_pstate=disabled. TuneD does not need to wait for this patch to implement the above algorithm. [2] The reason the x86_energy_perf_policy should be used is because in some cases the energy_perf_bias file will not exist. If that file exists, the x86_energy_perf_policy binary will use it. Else it will directly write the appropriate MSR to update the energy_perf_bias value. Joe Sorry for the delay. I was busy and then I had to go back through email to refresh my memory of the process. The algorithm described in comment #9 is correct. P. Hi Jan: Per our offline discussions, here is an updated algorithm. Please delete the one from comment #9. The reasons for this update are: The algorithm from comment #9 will opens the possibility that some customer will complain about performance because we'll no longer be setting EPB to 0 as we've been doing all along. When I look through all the ./arch/x86/kernel/cpu/intel.c history, it's pretty clear that EPB=0 is performance mode and the upstream default of EPB=6 is the "normal" mode. (Even though our testing has shown no performance regression when setting EPB=6 vs EPB=0.) Recall the motivation for all this is to not set an EPB value on any cpu that doesn't like that value, (currently only Alder Lake where bad things can happen if EPB<=6). The good news is the kernel will correctly adjust the user-specified value, as long as the user updates EPB via the sysfs file. The x86_energy_perf_policy tool will bypass the kernel and will write the EPB MSR directly if the sysfs EPB file does not exist. We know this happens on pre-Skylake cpus which is OK. But we can't guarantee it won't happen on newer cpus where it won't always be OK. Here's the updated algorithm: if (lscpu Flags show 'hwp_epp') { /* * The cpu is Skylake or newer. */ if (energy_performance_available_preferences exists) { /* * The intel_pstate driver is being used. */ if (the user requested preference is an available preference) { write that value to the energy_performance_preference file } } if (an energy_perf_bias value was also requested) { /* * Certain cpus newer than Skylake have restrictions on what * EPB values can be specified. The kernel manages that but * only if the sysfs files are written to directly. * We don't want to use the x86_energy_perf_policy tool here * because under certain circumstances, it may write the * EPB MSR directly, bypassing any kernel checks. */ if (energy_perf_bias file exists && the user requested an EPB to be set) { write that value directly to the sysfs energy_perf_bias file. } } return } else { /* * Getting here means the cpu does not support 'hwp_epp', * e.g. it's older than Skylake. * It's important here to use the x86_energy_perf_policy tool, as * it will know how to write the EPB MSR directly in case the EPB * file does not exist. */ if (an energy_perf_bias value was requested) { write it directly to energy_perf_bias using the x86_energy_perf_policy tool } return } Joe Upstream PR: https://github.com/redhat-performance/tuned/pull/479 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (tuned bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2585 |