Bug 1956387

Summary: tuned: add tuned-adm verify --verbose option to help validate active profile settings
Product: Red Hat Enterprise Linux 8 Reporter: Prasad Pandit <ppandit>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED MIGRATED QA Contact: Robin Hack <rhack>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.5CC: jeder, jmencak, jskarvad, mtosatti, nilal, pezhang
Target Milestone: betaKeywords: MigratedToJIRA, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-21 21:07:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1932086    

Description Prasad Pandit 2021-05-03 15:23:31 UTC
Description of problem:

* Verification and validation of an active tuned(8) profile is a two fold issue:

1) $ tuned-adm(8) verify  - is being worked on at => BZ#1947858

This bug is for the second part below

2) $ tuned-adm(8) verify
    - Does not offer convenient user-interface/interaction for users to be able to see and confirm
      that tuned-adm(8) verify is working well.
    - It does not present verification results to a user properly.
      Verification results get logged to the /var/log/tnued/tuned.log file.
===
# tuned-adm verify
Verification failed, current system settings differ from the preset profile.
You can mostly fix this by restarting the Tuned daemon, e.g.:
  systemctl restart tuned
or
  service tuned restart
Sometimes (if some plugins like bootloader are used) a reboot may be required.
See tuned log file ('/var/log/tuned/tuned.log') for details.
===

$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.4 Beta (Ootpa)
$ rpm -q tuned
tuned-2.15.0-2.el8.noarch
$ 
$ tuned --versio
tuned 2.15.0


How reproducible: 100%


Steps to Reproduce:
1. install tuned
2. activate realtime profile
3. run tuned-adm verify

Expected results:
  - Make it easy for users to validate active profile settings with certainty.

Comment 1 Prasad Pandit 2021-05-03 15:25:26 UTC
This bug is to address part 2) -> https://bugzilla.redhat.com/show_bug.cgi?id=1947858#c4

Comment 2 Prasad Pandit 2021-05-04 13:26:46 UTC
===
[root@virtlab500 tuned]# tuned-adm active
Current active profile: balanced
[root@virtlab500 tuned]# 

[root@virtlab500 tuned]# tuned-adm verify 
Verfication succeeded, current system settings match the preset profile.
See tuned log file ('/var/log/tuned/tuned.log') for details.
[root@virtlab500 tuned]# 

[root@virtlab500 tuned]# tuned-adm verify --verbose |less
[cpu]
 governor=conservative|powersave
 energy_perf_bias=normal
Verification: Pass

[sysctl]
 kernel.sched_min_granularity_ns=3000000
 kernel.sched_wakeup_granularity_ns=4000000
 vm.dirty_ratio=10
 vm.dirty_background_ratio=3
 vm.swappiness=10
 kernel.sched_migration_cost_ns=5000000
 net.core.busy_read=None, expected: 50
 net.core.busy_poll=None, expected: 50
 net.ipv4.tcp_fastopen=3
 kernel.numa_balancing=None, expected: 0
 kernel.hung_task_timeout_secs=600
 kernel.nmi_watchdog=0
 kernel.sched_rt_runtime_us=-1
 vm.stat_interval=10
 kernel.timer_migration=0
Verification: Fail

...
[sysfs]
 /sys/bus/workqueue/devices/writeback/cpumask=f55f
 /sys/devices/virtual/workqueue/cpumask=f55f
 /sys/devices/system/machinecheck/machinecheck*/ignore_ce=1
 /sys/devices/system/machinecheck/machinecheck*/ignore_ce=1
 ...
 /sys/devices/system/machinecheck/machinecheck*/ignore_ce=1
 /sys/devices/system/machinecheck/machinecheck*/ignore_ce=1
 /sys/kernel/mm/ksm/run=2
 /sys/kernel/ktimer_lockless_check=0, expected: 1
Verification: Fail

[modules]
 cpufreq_conservative: verify: failed: 'module 'cpufreq_conservative' is not loaded'
Verification: Fail

[audio]
 []
Verification: Pass

[video]
 radeon_powersave=dpm-balanced, auto
Verification: Pass

[disk]
 []
Verification: Pass

[scsi_host]
 alpm=medium_power
  host0=None
Verification: Fail

See tuned log file ('/var/log/tuned/tuned.log') for details.
===

@Jaroslav:
   current patch enables 'tuned-adm verify' to display output as above.
   before I raise a PR, just wanted to check if you've have any inputs/thoughts/suggestions?

Thank you.

Comment 3 Nitesh Narayan Lal 2021-05-04 16:10:32 UTC
Nice work, can you also share the output of tuned-adm verify without verbose?
Will it only report whether the verification has passed or failed?

Have you also looked into the scheduler verification?

Comment 4 Marcelo Tosatti 2021-05-04 16:58:02 UTC
(In reply to Prasad J Pandit from comment #2)
> ===
> [root@virtlab500 tuned]# tuned-adm active
> Current active profile: balanced
> [root@virtlab500 tuned]# 
> 
> [root@virtlab500 tuned]# tuned-adm verify 
> Verfication succeeded, current system settings match the preset profile.
> See tuned log file ('/var/log/tuned/tuned.log') for details.
> [root@virtlab500 tuned]# 
> 
> [root@virtlab500 tuned]# tuned-adm verify --verbose |less
> [cpu]
>  governor=conservative|powersave
>  energy_perf_bias=normal
> Verification: Pass
...

Looks good to me! Thanks.

Comment 5 Prasad Pandit 2021-05-05 12:20:25 UTC
(In reply to Nitesh Narayan Lal from comment #3)
> Nice work, can you also share the output of tuned-adm verify without verbose?
> Will it only report whether the verification has passed or failed?

Yes, above comment shows an output of the default 'tuned-adm verify' run.
There's no change to default behaviour.

> Have you also looked into the scheduler verification?

Yes, currently it looks as below

[root@virtlab500 tuned]# tuned-adm verify --verbose |less
...
[scheduler]
 ps_whitelist=None
 ps_blacklist=None, expected: ksoftirqd.*;rcuc.*;rcub.*;ktimersoftd.*;.*pmd.*;.*PMD.*;^DPDK;.*qemu-kvm.*
 default_irq_smp_affinity=None, expected: calc
 perf_process_fork=None, expected: false
 isolated_cores=None, expected: 5,7,9,11
Verification: Fail

Thank you.

Comment 6 Nitesh Narayan Lal 2021-05-05 13:55:24 UTC
(In reply to Prasad J Pandit from comment #5)
> (In reply to Nitesh Narayan Lal from comment #3)
> > Nice work, can you also share the output of tuned-adm verify without verbose?
> > Will it only report whether the verification has passed or failed?
> 
> Yes, above comment shows an output of the default 'tuned-adm verify' run.
> There's no change to default behaviour.
> 

Maybe we can improve this as well, for example, right now it shows:

# tuned-adm active
Current active profile: realtime-virtual-host

# tuned-adm verify
Verification failed, current system settings differ from the preset profile.
You can mostly fix this by restarting the Tuned daemon, e.g.:
  systemctl restart tuned
or
  service tuned restart
Sometimes (if some plugins like bootloader are used) a reboot may be required.
See tuned log file ('/var/log/tuned/tuned.log') for details.

The suggestion to restart tuned in itself is pretty vague IMHO.
Instead, it will be useful if we can print something like the number of errors
that we triggered during the verification here.

What do you think?

Thanks

Comment 7 Prasad Pandit 2021-05-06 16:41:09 UTC
(In reply to Nitesh Narayan Lal from comment #6)
> Maybe we can improve this as well, for example, right now it shows:
... 
> The suggestion to restart tuned in itself is pretty vague IMHO.
> Instead, it will be useful if we can print something like the number of
> errors that we triggered during the verification here.

* True, restarting tuned(8) does not seem an effective solution.
  Sometimes current system settings may differ because the system
  does not support certain option ex.

  => tuned.plugins.plugin_vm: Option 'transparent_hugepages' is not supported on current hardware.

OR if an error/exception occurs while setting some parameter ex.

  ...
   File "/usr/lib/python3.6/site-packages/tuned/plugins/plugin_net.py", line 400, in _set_device_parameters
     if context == "channels" and int(dev_params[next(iter(d))]) == 0:
  => ValueError: invalid literal for int() with base 10: 'n/a'

OR if the kernel does not define certain parameter

  => tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'kernel.numa_balancing', the parameter does not exist


In such situations, restarting tuned(8) isn't going to be helpful.


* I guess absent kernel parameter or system support error could be result of different kernel versions
  and/or system capabilities. They need to be addressed by tweaking tuned(8) profiles for each supported
  RHEL versions.

  I think it can be addressed as another new bug fix, instead of combining with this one.
  Because tweaking each profile parameter for different RHEL versions may take more time/testing.


...wdyt?

Thank you.

Comment 8 Nitesh Narayan Lal 2021-05-06 17:09:59 UTC
(In reply to Prasad J Pandit from comment #7)
> (In reply to Nitesh Narayan Lal from comment #6)
> > Maybe we can improve this as well, for example, right now it shows:
> ... 
> > The suggestion to restart tuned in itself is pretty vague IMHO.
> > Instead, it will be useful if we can print something like the number of
> > errors that we triggered during the verification here.
> 
> * True, restarting tuned(8) does not seem an effective solution.
>   Sometimes current system settings may differ because the system
>   does not support certain option ex.
> 
>   => tuned.plugins.plugin_vm: Option 'transparent_hugepages' is not
> supported on current hardware.

Exactly

> 
> OR if an error/exception occurs while setting some parameter ex.
> 
>   ...
>    File "/usr/lib/python3.6/site-packages/tuned/plugins/plugin_net.py", line
> 400, in _set_device_parameters
>      if context == "channels" and int(dev_params[next(iter(d))]) == 0:
>   => ValueError: invalid literal for int() with base 10: 'n/a'
> 


This is a bug reported in BZ 1943291:)
The fix is upstream but I don't think it has been backported yet.


> OR if the kernel does not define certain parameter
> 
>   => tuned.plugins.plugin_sysctl: Failed to read sysctl parameter
> 'kernel.numa_balancing', the parameter does not exist
> 
> 
> In such situations, restarting tuned(8) isn't going to be helpful.
> 

Right.

> 
> * I guess absent kernel parameter or system support error could be result of
> different kernel versions
>   and/or system capabilities. They need to be addressed by tweaking tuned(8)
> profiles for each supported
>   RHEL versions.
> 
>   I think it can be addressed as another new bug fix, instead of combining
> with this one.
>   Because tweaking each profile parameter for different RHEL versions may
> take more time/testing.
> 
> 
> ...wdyt?
> 
> Thank you.

Yeah, I agree to fix the individual plugins we should open a new BZ.
However, what we can do here (in your upstream PR) is just improve this
messaging that we get when verification fails without the verbose option.

Thanks

Comment 9 Prasad Pandit 2021-05-10 12:45:42 UTC
New output with and without --verbose option

===
[root@virtlab500 ~]# 
[root@virtlab500 ~]# tuned-adm verify
Active profile => realtime-virtual-host
Verification failed, current system settings differ from the preset profile.
 * Use 'tuned-adm verify --verbose' option to know differing system settings.
 * Sometimes (if some plugins like bootloader are used) a reboot may be required.
 * See tuned log file ('/var/log/tuned/tuned.log') for details.
===

[root@virtlab500 ~]# 
[root@virtlab500 ~]# tuned-adm verify -v |less
Active profile => realtime-virtual-host
[cpu]
 governor=performance
 energy_perf_bias=performance
Verification: Pass

[sysctl]
 kernel.sched_min_granularity_ns=3000000
 kernel.sched_wakeup_granularity_ns=4000000
 vm.dirty_ratio=10
 vm.dirty_background_ratio=3
 vm.swappiness=10
 kernel.sched_migration_cost_ns=5000000
 net.core.busy_read=None, [Expected: 50]
 net.core.busy_poll=None, [Expected: 50]
 net.ipv4.tcp_fastopen=3
 kernel.numa_balancing=None, [Expected: 0]
 kernel.hung_task_timeout_secs=600
 kernel.nmi_watchdog=0
 kernel.sched_rt_runtime_us=-1
 vm.stat_interval=10
 kernel.timer_migration=0
Verification: Fail

[sysfs]
 /sys/bus/workqueue/devices/writeback/cpumask=f55f
 /sys/devices/virtual/workqueue/cpumask=f55f
 /sys/devices/system/machinecheck/machinecheck*/ignore_ce=1
 /sys/devices/system/machinecheck/machinecheck*/ignore_ce=1
 /sys/devices/system/machinecheck/machinecheck*/ignore_ce=1
...
[irqbalance]
 banned_cpus=None, [Expected: 5,7,9,11]
Verification: Fail

[script]
 scripts: verify: failed: '['/usr/lib/tuned/realtime/script.sh', '/usr/lib/tuned/realtime-virtual-host/script.sh']'                                                                           
Verification: Fail

[scheduler]
 ps_whitelist=None
 ps_blacklist=None, [Expected: ksoftirqd.*;rcuc.*;rcub.*;ktimersoftd.*;.*pmd.*;.*PMD.*;^DPDK;.*qemu-kvm.*]                                                                                    
 default_irq_smp_affinity=None, [Expected: calc]
 perf_process_fork=None, [Expected: false]
 isolated_cores=None, [Expected: 5,7,9,11]
Verification: Fail

Verification failed, current system settings differ from the preset profile.
 * Sometimes (if some plugins like bootloader are used) a reboot may be required.
 * See tuned log file ('/var/log/tuned/tuned.log') for details.
===

Comment 10 Nitesh Narayan Lal 2021-05-10 13:34:05 UTC
Looks good, thanks.

Comment 11 Prasad Pandit 2021-05-10 19:12:58 UTC
PR#342 -> https://github.com/redhat-performance/tuned/pull/342

Comment 12 Prasad Pandit 2021-05-13 08:26:37 UTC
Revised PR#346: RHBZ#1956387 tuned-adm: add --verbose option to verify command
  -> https://github.com/redhat-performance/tuned/pull/346

Comment 13 Prasad Pandit 2021-05-17 11:35:07 UTC
Brew scratch build with above patch
  -> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=36728362

It may help with the PR#346 review.

Comment 14 Prasad Pandit 2021-05-24 12:01:16 UTC
Revised new scratch build
  ->  https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=36942161

Revised new PR #350
  -> https://github.com/redhat-performance/tuned/pull/350

Comment 15 Prasad Pandit 2021-05-26 06:49:32 UTC
Latest scratch build
  -> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=36984097

Comment 16 Prasad Pandit 2021-05-26 13:14:32 UTC
Revised scratch build with plugin_modules.py fix update

  -> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=36991631

Retain the return (ret) value behaviour as before.

Comment 17 Prasad Pandit 2021-06-01 13:18:25 UTC
Revised PR#351
  -> https://github.com/redhat-performance/tuned/pull/351

Comment 19 Prasad Pandit 2021-08-18 04:49:36 UTC
Rebased PR#351 to v2.16.0
  -> https://github.com/redhat-performance/tuned/pull/351

Scratch build
  -> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=38991907

Comment 22 Jaroslav Škarvada 2022-11-01 10:28:37 UTC
It's being worked on in the PR.

Comment 24 RHEL Program Management 2023-09-21 20:51:29 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 25 RHEL Program Management 2023-09-21 21:07:33 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.