Bug 1461509

Summary: realtime-virtual-host: error doesn't prevent profile from getting applied
Product: Red Hat Enterprise Linux 7 Reporter: Luiz Capitulino <lcapitulino>
Component: tunedAssignee: Luiz Capitulino <lcapitulino>
Status: CLOSED ERRATA QA Contact: Tereza Cerna <tcerna>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: jeder, jskarvad, lcapitulino, olysonek, psklenar, tcerna
Target Milestone: rcKeywords: Patch, Upstream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tuned-2.10.0-0.1.rc1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1626082 (view as bug list) Environment:
Last Closed: 2018-10-30 10:48:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1240765    
Attachments:
Description Flags
patch1
none
patch2
none
patch3
none
patch4 none

Description Luiz Capitulino 2017-06-14 15:56:43 UTC
Description of problem:

The realtime-virtual-host profile seems to be missing some error checks. This causes the profile to get applied even though the configuration has failed.

Version-Release number of selected component (if applicable): tuned-2.8.0-5.el7.noarch


How reproducible:


Steps to Reproduce:
1. Make sure the realtime-virtual-host is not applied and clear its cache files

# tuned-adm profile desktop (or any other profile_
# rm -f /usr/lib/tuned/realtime-virtual-host/{lapic_timer_adv_ns,lapic_timer_adv_ns.cpumodel}

2. Install the kernel-rt package without installing kernel-rt-kvm OR temporarily unload kvm modules and rename them

3. Active the realtime-virtual-host profile and check it has succeeded

# tuned-adm profile realtime-virtual-host
# echo $?
0

4. The problem hasn't been correctly applied because a VM is ran during the activation process. This can be confirmed by checking that the lapic_timer files from step 1 weren't created

NOTE: Maybe things will get automatically fixed if the modules are loaded and tuned restarted or the machine reboots. But this is still something to be fixed.

Comment 2 Luiz Capitulino 2017-06-26 17:47:22 UTC
Note to self: it's probably a good idea to check all profiles for this problem.

Comment 3 Luiz Capitulino 2017-10-13 17:06:56 UTC
I'm not sure I'll be able to fix this for 7.5, since this missed tuned release boat. So, I think it's important we're aware of this issue's impact:

1. This BZ is really about the realtime-virtual-host missing almost all error checking. Actually, this is common among all tuned profiles, but this profile has code that can fail in a reproducible way (see comment 4)

2. I also found that the realtime-virtual-host profile has broken bash code, which fails at every single run. As it fails silently, we never knew it. Luckily (or not), it's just a sanity check that's being skipped

3. The very worst case scenario for this bug is fundamental step failing (such as running tuna to isolated a CPU) and we not seeing it since this fails silently. But this has never happened in practice

4. What seems to be a reproducible way to get trig this bug (which is what I got originally) is:

A. the kernel-rt package is installed but kernel-rt-kvm is not
B. The realtime-virtual-host profile is activated
C. Finding the best latency for the advance timer feature will silently fail, since the kvm module is not loaded
D. The profile is applied
E. kernel-rt-kvm is installed
F. A realtime guest is started
H. realtime guest will get spikes, since /sys/module/kvm/parameters/lapic_timer_advance_ns=0

However, rebooting or reloading tuned after step E should fix things up.

Comment 4 Luiz Capitulino 2017-10-18 18:41:49 UTC
This issue doesn't reproduce very easily and it has an workaround. Let's move to 7.6, as this missed the tuned release.

Comment 5 Luiz Capitulino 2018-02-22 18:14:35 UTC
Created attachment 1399499 [details]
patch1

Comment 6 Luiz Capitulino 2018-02-22 18:14:59 UTC
Created attachment 1399500 [details]
patch2

Comment 7 Luiz Capitulino 2018-02-22 18:15:21 UTC
Created attachment 1399501 [details]
patch3

Comment 8 Luiz Capitulino 2018-02-22 18:15:48 UTC
Created attachment 1399502 [details]
patch4

Comment 9 Luiz Capitulino 2018-02-22 18:18:35 UTC
Posted series to maintainer and here. Note that this series is only half the battle: it detects the error and logs them, but we also need bug 1385838 so that tuned notifies systemd an error happened.

Comment 10 Jaroslav Škarvada 2018-02-22 20:07:36 UTC
(In reply to Luiz Capitulino from comment #9)
> Posted series to maintainer and here. Note that this series is only half the
> battle: it detects the error and logs them, but we also need bug 1385838 so
> that tuned notifies systemd an error happened.

Thanks. Upstream commits:
https://github.com/redhat-performance/tuned/commit/c823e3c5f2a003717a4a0b73dde4c4003bbbe567
https://github.com/redhat-performance/tuned/commit/c989a8bd7cfa13d95e31875c564fd03630e54b6f
https://github.com/redhat-performance/tuned/commit/685e16640dc5c9c25037eb97cec41f9df308db46
https://github.com/redhat-performance/tuned/commit/c614ad03ff668c753ddf99772c3b47c80412533d

Comment 12 Luiz Capitulino 2018-06-13 17:27:57 UTC
So, an error message is now printed. However, tuned-adm returns zero and reports the realtime-virtual-host profile has been applied:

[root@virtlab500 realtime-virtual-host]# tuned-adm profile realtime-virtual-host
ERROR    tuned.utils.commands: Executing sysctl error: sysctl: cannot stat /proc/sys/kernel/numa_balancing: No such file or directory
ERROR    tuned.plugins.plugin_script: script '/usr/lib/tuned/realtime-virtual-host/script.sh' error output: 'Failed to set smp_affinity for IRQ 33: [Errno 5] Input/output error
Failed to set smp_affinity for IRQ 34: [Errno 5] Input/output error
Failed to set smp_affinity for IRQ 35: [Errno 5] Input/output error
Failed to set smp_affinity for IRQ 36: [Errno 5] Input/output error
defirqaffinity.py remove failed'
ERROR    tuned.plugins.plugin_script: script '/usr/lib/tuned/realtime-virtual-host/script.sh' returned error code: 1 <----------- New error message reporting the activation script has failed
[root@virtlab500 realtime-virtual-host]# echo $?
0
[root@virtlab500 realtime-virtual-host]# tuned-adm active
Current active profile: realtime-virtual-host
[root@virtlab500 realtime-virtual-host]# 

(Please, ignore the IRQ error messages since this is bug 1590937).

While printing the error message is helpful, it is not enough to solve this issue. IMO, in case of an error tuned should:

1. Go back to the previous profile
2. Return an error code

Now, I understand this might be an impactful change for 7.6 at this point. So, I'd be fine to move this BZ to 7.7 or even RHEL8 as long as we agree this has to be done.

What you think Jaroslav?

Comment 14 Jaroslav Škarvada 2018-09-06 14:19:54 UTC
(In reply to Luiz Capitulino from comment #12)
> So, an error message is now printed. However, tuned-adm returns zero and
> reports the realtime-virtual-host profile has been applied:
> 
> [root@virtlab500 realtime-virtual-host]# tuned-adm profile
> realtime-virtual-host
> ERROR    tuned.utils.commands: Executing sysctl error: sysctl: cannot stat
> /proc/sys/kernel/numa_balancing: No such file or directory
> ERROR    tuned.plugins.plugin_script: script
> '/usr/lib/tuned/realtime-virtual-host/script.sh' error output: 'Failed to
> set smp_affinity for IRQ 33: [Errno 5] Input/output error
> Failed to set smp_affinity for IRQ 34: [Errno 5] Input/output error
> Failed to set smp_affinity for IRQ 35: [Errno 5] Input/output error
> Failed to set smp_affinity for IRQ 36: [Errno 5] Input/output error
> defirqaffinity.py remove failed'
> ERROR    tuned.plugins.plugin_script: script
> '/usr/lib/tuned/realtime-virtual-host/script.sh' returned error code: 1
> <----------- New error message reporting the activation script has failed
> [root@virtlab500 realtime-virtual-host]# echo $?
> 0
> [root@virtlab500 realtime-virtual-host]# tuned-adm active
> Current active profile: realtime-virtual-host
> [root@virtlab500 realtime-virtual-host]# 
> 
> (Please, ignore the IRQ error messages since this is bug 1590937).
> 
> While printing the error message is helpful, it is not enough to solve this
> issue. IMO, in case of an error tuned should:
> 
> 1. Go back to the previous profile
> 2. Return an error code
> 
> Now, I understand this might be an impactful change for 7.6 at this point.
> So, I'd be fine to move this BZ to 7.7 or even RHEL8 as long as we agree
> this has to be done.
> 
> What you think Jaroslav?

I agree, I will clone it and we will address it in next release.

Comment 15 Jaroslav Škarvada 2018-09-06 14:22:25 UTC
Cloned as bug 1626082.

Comment 16 Tereza Cerna 2018-09-07 11:40:25 UTC
Package tuned-2.10.0-4.el7.noarch was tested.

I did sanity check of attached patches and new package. All patches were applied.
Several situations were simulated and corresponding behavior was executed (calling of function die, error messages, calling of function run_tsc_deadline_latency...).

Following testing will continue in BZ#1626082 which will implemented rollback when some fatal error will appear. It is useful to deploy current fix (not move whole bug to 7.7), because of repaired naming of function run_tsc_deadline_latency and improved error message when application of realtime-virtual-host profile fails.

Comment 18 errata-xmlrpc 2018-10-30 10:48:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3172