Bug 1461509
Summary: | realtime-virtual-host: error doesn't prevent profile from getting applied | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Luiz Capitulino <lcapitulino> | ||||||||||
Component: | tuned | Assignee: | Luiz Capitulino <lcapitulino> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Tereza Cerna <tcerna> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 7.4 | CC: | jeder, jskarvad, lcapitulino, olysonek, psklenar, tcerna | ||||||||||
Target Milestone: | rc | Keywords: | Patch, Upstream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | tuned-2.10.0-0.1.rc1.el7 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 1626082 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2018-10-30 10:48:57 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1240765 | ||||||||||||
Attachments: |
|
Description
Luiz Capitulino
2017-06-14 15:56:43 UTC
Note to self: it's probably a good idea to check all profiles for this problem. I'm not sure I'll be able to fix this for 7.5, since this missed tuned release boat. So, I think it's important we're aware of this issue's impact: 1. This BZ is really about the realtime-virtual-host missing almost all error checking. Actually, this is common among all tuned profiles, but this profile has code that can fail in a reproducible way (see comment 4) 2. I also found that the realtime-virtual-host profile has broken bash code, which fails at every single run. As it fails silently, we never knew it. Luckily (or not), it's just a sanity check that's being skipped 3. The very worst case scenario for this bug is fundamental step failing (such as running tuna to isolated a CPU) and we not seeing it since this fails silently. But this has never happened in practice 4. What seems to be a reproducible way to get trig this bug (which is what I got originally) is: A. the kernel-rt package is installed but kernel-rt-kvm is not B. The realtime-virtual-host profile is activated C. Finding the best latency for the advance timer feature will silently fail, since the kvm module is not loaded D. The profile is applied E. kernel-rt-kvm is installed F. A realtime guest is started H. realtime guest will get spikes, since /sys/module/kvm/parameters/lapic_timer_advance_ns=0 However, rebooting or reloading tuned after step E should fix things up. This issue doesn't reproduce very easily and it has an workaround. Let's move to 7.6, as this missed the tuned release. Created attachment 1399499 [details]
patch1
Created attachment 1399500 [details]
patch2
Created attachment 1399501 [details]
patch3
Created attachment 1399502 [details]
patch4
Posted series to maintainer and here. Note that this series is only half the battle: it detects the error and logs them, but we also need bug 1385838 so that tuned notifies systemd an error happened. (In reply to Luiz Capitulino from comment #9) > Posted series to maintainer and here. Note that this series is only half the > battle: it detects the error and logs them, but we also need bug 1385838 so > that tuned notifies systemd an error happened. Thanks. Upstream commits: https://github.com/redhat-performance/tuned/commit/c823e3c5f2a003717a4a0b73dde4c4003bbbe567 https://github.com/redhat-performance/tuned/commit/c989a8bd7cfa13d95e31875c564fd03630e54b6f https://github.com/redhat-performance/tuned/commit/685e16640dc5c9c25037eb97cec41f9df308db46 https://github.com/redhat-performance/tuned/commit/c614ad03ff668c753ddf99772c3b47c80412533d So, an error message is now printed. However, tuned-adm returns zero and reports the realtime-virtual-host profile has been applied: [root@virtlab500 realtime-virtual-host]# tuned-adm profile realtime-virtual-host ERROR tuned.utils.commands: Executing sysctl error: sysctl: cannot stat /proc/sys/kernel/numa_balancing: No such file or directory ERROR tuned.plugins.plugin_script: script '/usr/lib/tuned/realtime-virtual-host/script.sh' error output: 'Failed to set smp_affinity for IRQ 33: [Errno 5] Input/output error Failed to set smp_affinity for IRQ 34: [Errno 5] Input/output error Failed to set smp_affinity for IRQ 35: [Errno 5] Input/output error Failed to set smp_affinity for IRQ 36: [Errno 5] Input/output error defirqaffinity.py remove failed' ERROR tuned.plugins.plugin_script: script '/usr/lib/tuned/realtime-virtual-host/script.sh' returned error code: 1 <----------- New error message reporting the activation script has failed [root@virtlab500 realtime-virtual-host]# echo $? 0 [root@virtlab500 realtime-virtual-host]# tuned-adm active Current active profile: realtime-virtual-host [root@virtlab500 realtime-virtual-host]# (Please, ignore the IRQ error messages since this is bug 1590937). While printing the error message is helpful, it is not enough to solve this issue. IMO, in case of an error tuned should: 1. Go back to the previous profile 2. Return an error code Now, I understand this might be an impactful change for 7.6 at this point. So, I'd be fine to move this BZ to 7.7 or even RHEL8 as long as we agree this has to be done. What you think Jaroslav? (In reply to Luiz Capitulino from comment #12) > So, an error message is now printed. However, tuned-adm returns zero and > reports the realtime-virtual-host profile has been applied: > > [root@virtlab500 realtime-virtual-host]# tuned-adm profile > realtime-virtual-host > ERROR tuned.utils.commands: Executing sysctl error: sysctl: cannot stat > /proc/sys/kernel/numa_balancing: No such file or directory > ERROR tuned.plugins.plugin_script: script > '/usr/lib/tuned/realtime-virtual-host/script.sh' error output: 'Failed to > set smp_affinity for IRQ 33: [Errno 5] Input/output error > Failed to set smp_affinity for IRQ 34: [Errno 5] Input/output error > Failed to set smp_affinity for IRQ 35: [Errno 5] Input/output error > Failed to set smp_affinity for IRQ 36: [Errno 5] Input/output error > defirqaffinity.py remove failed' > ERROR tuned.plugins.plugin_script: script > '/usr/lib/tuned/realtime-virtual-host/script.sh' returned error code: 1 > <----------- New error message reporting the activation script has failed > [root@virtlab500 realtime-virtual-host]# echo $? > 0 > [root@virtlab500 realtime-virtual-host]# tuned-adm active > Current active profile: realtime-virtual-host > [root@virtlab500 realtime-virtual-host]# > > (Please, ignore the IRQ error messages since this is bug 1590937). > > While printing the error message is helpful, it is not enough to solve this > issue. IMO, in case of an error tuned should: > > 1. Go back to the previous profile > 2. Return an error code > > Now, I understand this might be an impactful change for 7.6 at this point. > So, I'd be fine to move this BZ to 7.7 or even RHEL8 as long as we agree > this has to be done. > > What you think Jaroslav? I agree, I will clone it and we will address it in next release. Cloned as bug 1626082. Package tuned-2.10.0-4.el7.noarch was tested. I did sanity check of attached patches and new package. All patches were applied. Several situations were simulated and corresponding behavior was executed (calling of function die, error messages, calling of function run_tsc_deadline_latency...). Following testing will continue in BZ#1626082 which will implemented rollback when some fatal error will appear. It is useful to deploy current fix (not move whole bug to 7.7), because of repaired naming of function run_tsc_deadline_latency and improved error message when application of realtime-virtual-host profile fails. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3172 |