Bug 1475571
| Summary: | tuned fails to update boot option parameter via cloud-init for KVM instance | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | jianzzha | ||||
| Component: | tuned | Assignee: | Ondřej Lysoněk <olysonek> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Tereza Cerna <tcerna> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 7.3 | CC: | atelang, fkrska, jeder, jianzzha, jskarvad, olysonek, supadhya, tcerna, thozza, toneata | ||||
| Target Milestone: | rc | Keywords: | Patch, Upstream, ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | tuned-2.9.0-0.1.rc1.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1507564 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-04-10 16:04:16 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1467576, 1485946, 1488369, 1507564 | ||||||
| Attachments: |
|
||||||
|
Description
jianzzha
2017-07-26 22:39:20 UTC
I can't reproduce it. I ran the following on a fresh RHEL-7.4 machine: # yum install tuned-profiles-cpu-partitioning # echo 'isolated_cores=0' > /etc/tuned/cpu-partitioning-variables.conf # tuned-adm profile cpu-partitioning but after reboot the kernel command line changes are applied. I have # cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.el7.x86_64 root=UUID=a0d61a4b-0438-49b6-91be-859938ccf14d ro console=tty0 console=ttyS0,115200 crashkernel=auto net.ifnames=0 rhgb quiet LANG=en_US.UTF-8 nohz=on nohz_full=0 rcu_nocbs=0 tuned.non_isolcpus=0000000e intel_pstate=disable nosoftlockup You mentioned cloud-init. What is it supposed to do? When and how do you run it? It would help if you'd provide the logs of Tuned in debug mode. Make sure Tuned is stopped # systemctl stop tuned and try running Tuned as # tuned -l -P -D -d Then run your reproducer and provide the log file /var/log/tuned/tuned.log. Thanks. Also, what is the output of the following? # type tuned-adm Just to make sure you don't have tuned-adm aliased to 'tuned-adm --async' or something. One more thing. Was Tuned already running before you ran 'tuned-adm profile cpu-partitioning'? If not, then it's probably related to https://github.com/redhat-performance/tuned/issues/23 and https://bugzilla.redhat.com/show_bug.cgi?id=1443142 (In reply to Ondřej Lysoněk from comment #2) > I can't reproduce it. I ran the following on a fresh RHEL-7.4 machine: > > # yum install tuned-profiles-cpu-partitioning > # echo 'isolated_cores=0' > /etc/tuned/cpu-partitioning-variables.conf > # tuned-adm profile cpu-partitioning > > but after reboot the kernel command line changes are applied. I have > # cat /proc/cmdline > BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.el7.x86_64 > root=UUID=a0d61a4b-0438-49b6-91be-859938ccf14d ro console=tty0 > console=ttyS0,115200 crashkernel=auto net.ifnames=0 rhgb quiet > LANG=en_US.UTF-8 nohz=on nohz_full=0 rcu_nocbs=0 tuned.non_isolcpus=0000000e > intel_pstate=disable nosoftlockup > > You mentioned cloud-init. What is it supposed to do? When and how do you run > it? > > It would help if you'd provide the logs of Tuned in debug mode. Make sure > Tuned is stopped > # systemctl stop tuned > > and try running Tuned as > # tuned -l -P -D -d > > Then run your reproducer and provide the log file /var/log/tuned/tuned.log. > > Thanks. Cloud-init is a tool that is used to customize an cloud instance when it is first booted. It is possible that certain required service by tuned is not started yet when cloud-init is run. I will have collect the required log and check if that's the case Here is related console log during cloud-init phase, [ 58.873065] cloud-init[5444]: + systemctl stop tuned [ 59.899984] cloud-init[5444]: + tuned -l -P -D -d [ 59.990447] cloud-init[5444]: + yum install -y tuned-profiles-cpu-partitioning .... [ 62.414081] cloud-init[5444]: + tuned-adm profile cpu-partitioning [ 87.505351] cloud-init[5444]: DBus call to Tuned daemon failed [ 178.839605] NMI watchdog: disabled (cpu0): hardware events not enabled [ 178.840422] NMI watchdog: Shutting down hard lockup detector on all cpus [ 178.497120] cloud-init[5444]: Trying to (re)start tuned... [ 178.497500] cloud-init[5444]: Tuned (re)start failed, you need to (re)start tuned by hand for changes to apply. so looks like tuned isn't running even though the "tuned -l -P -D -d" was applied earlier Created attachment 1314273 [details]
tuned.log
Btw, what versions of packages 'tuned' and 'tuned-profiles-cpu-partitioning' do you have? This is probably not related to this bug, but it seems you have some older Tuned version, which doesn't support the 'priority' option: 2017-08-16 13:29:09,634 WARNING tuned.plugins.base: Unknown option 'priority' for plugin 'ScriptPlugin'. There might be some packaging error which caused that tuned-profiles-cpu-partitioning didn't pull in the correct version of tuned as dependency. Actually it was, but the tuned-adm's request to change the profile timed out. And after that, tuned-adm assumed that Tuned was not running, so it attempted to do 'systemctl restart Tuned. So after that, two instances of Tuned were in fact running (because systemd didn't recognize the manually started Tuned process). From the logs it seems the following happened: 1. The previously running Tuned process stopped after 'systemctl stop tuned' 2. A new Tuned process was created after 'tuned -l -P -D -d' was issued and the daemon started up and applied the virtual-guest profile 3. 'tuned-adm profile cpu-partitioning' was issued. In resonse, the Tuned daemon issued a request to Polkit in order to check authorization of the request. 4. The tuned-adm's request timed out, because the Tuned daemon was taking a long time to respond (the 25 second default dbus time out). And the Tuned daemon was taking a long time to respond, because it wasn't receiving any reply from Polkit, and its request timed out as well (also after the 25 second default dbus time out). So both tuned-adm's and Tuned daemon's requests timed out at around the same time. 5. Then the two following things happend at around the same time: I. tuned-adm tried the fallback method of activating the requested profile, which is 'systemctl restart tuned'. So a new Tuned process started up. II. After Tuned daemon's request to Polkit timed out, it tried a fallback authorization method, which succeeded. And the daemon processed the request to switch the profile to cpu-partitioning. After that, the two Tuned damon instances probably collided somehow, which resulted in the traceback in the log and the message from tuned-adm 'Tuned (re)start failed, you need to (re)start tuned by hand for changes to apply'. Maybe we should learn from this and change the default timeout either in tuned-adm or the Tuned daemon, so we don't get these synchronous timeouts. But all of this is a result of running Tuned manually, which was a bad idea. I'm still not sure what is causing your original issue. I suspect that the synchronous timeouts are in fact the issue here. In your script, can you make sure that Polkit is running prior to calling 'tuned-adm profile cpu-partitioning'? We should definitely change the timeouts though. The timeout in tuned-adm must be longer than the time out of Tuned daemon's calls to Polkit. If the above doesn't help, I'll provide you with a test build which changes the timeouts. (In reply to Ondřej Lysoněk from comment #10) > I suspect that the synchronous timeouts are in fact the issue here. In your > script, can you make sure that Polkit is running prior to calling 'tuned-adm > profile cpu-partitioning'? > > We should definitely change the timeouts though. The timeout in tuned-adm > must be longer than the time out of Tuned daemon's calls to Polkit. If the > above doesn't help, I'll provide you with a test build which changes the > timeouts. Yes Polkit is running and I have following in the cloud-init script to make sure of it, if pgrep polkitd; then echo "polkitd is running" tuned-adm profile cpu-partitioning fi the if block gets executed. Hi Ondrej, what other info is needed from me? Please also take a look of a new bug submitted by the QE, I think it is related https://bugzilla.redhat.com/show_bug.cgi?id=1488517 (In reply to jianzzha from comment #14) > Hi Ondrej, what other info is needed from me? Sorry, I posted some private comments, but didn't realize you don't have the permissions to see them in bugzilla. But the comments are no longer relevant anyway. > Please also take a look of a new bug submitted by the QE, I think it is > related https://bugzilla.redhat.com/show_bug.cgi?id=1488517 Thank you very much for this, that helped a lot. I think it is indeed related. The bug should be fixed by this pull request: https://github.com/redhat-performance/tuned/pull/66 Upstream commits: https://github.com/redhat-performance/tuned/commit/701f09a89d7de6d00dc39909853e613469b5c8ed https://github.com/redhat-performance/tuned/commit/bfab9d8d22adb9a4382a94142782906674c1bf59 tuned-2.9.0-0.1.rc1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-d9c6b990df tuned-2.9.0-0.1.rc1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-d9c6b990df tuned-2.9.0-0.1.rc1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-5f0849d207 tuned-2.9.0-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-0e45ce4685 tuned-2.9.0-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-c30e9bd1ea Hello, We need GSS or PM Approved in order to clone. This is part of Core Services - Infrastructure Services, so it should be Filip Krska or Steve Almy. Verified in:
tuned-2.9.0-1.el7.noarch
tuned-profiles-cpu-partitioning-2.9.0-1.el7.noarch
cloud-init-0.7.9-20.el7.x86_64
PASS
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ LOG ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ PASS ] :: Command 'cat > /etc/cloud/cloud.cfg.d/a.cfg << EOF
:: [ PASS ] :: runcmd:
:: [ PASS ] :: - [ systemctl, enable, tuned ]
:: [ PASS ] :: - echo 'isolated_cores=1' > /etc/tuned/cpu-partitioning-variables.conf
:: [ PASS ] :: - [ tuned-adm, profile, cpu-partitioning ]
:: [ PASS ] :: - [ reboot ]
:: [ PASS ] :: EOF' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl enable cloud-init' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl enable cloud-config' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl enable cloud-final' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl start cloud-init' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl start cloud-config' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl start cloud-final' (Expected 0, got 0)
~~ REBOOT ~~
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ LOG ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ PASS ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 0)
:: [ PASS ] :: Command 'cat /proc/cmdline | grep nohz=on' (Expected 0, got 0)
:: [ PASS ] :: Command 'cat /proc/cmdline | grep nohz_full=1' (Expected 0, got 0)
:: [ PASS ] :: Command 'cat /proc/cmdline | grep rcu_nocbs=1' (Expected 0, got 0)
:: [ PASS ] :: Command 'cat /proc/cmdline | grep tuned.non_isolcpus=' (Expected 0, got 0)
:: [ PASS ] :: Command 'cat /proc/cmdline | grep intel_pstate=disable' (Expected 0, got 0)
:: [ PASS ] :: Command 'cat /proc/cmdline | grep nosoftlockup' (Expected 0, got 0)
:: [ LOG ] :: Duration: 1s
:: [ LOG ] :: Assertions: 7 good, 0 bad
:: [ PASS ] :: RESULT: Test
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reproduced in:
tuned-2.8.0-5.el7.noarch
tuned-profiles-cpu-partitioning-2.8.0-5.el7.noarch
cloud-init-0.7.9-20.el7.x86_64
FAIL
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ LOG ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ PASS ] :: Command 'cat > /etc/cloud/cloud.cfg.d/a.cfg << EOF
:: [ PASS ] :: runcmd:
:: [ PASS ] :: - [ systemctl, enable, tuned ]
:: [ PASS ] :: - echo 'isolated_cores=1' > /etc/tuned/cpu-partitioning-variables.conf
:: [ PASS ] :: - [ tuned-adm, profile, cpu-partitioning ]
:: [ PASS ] :: - [ reboot ]
:: [ PASS ] :: EOF' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl enable cloud-init' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl enable cloud-config' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl enable cloud-final' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl start cloud-init' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl start cloud-config' (Expected 0, got 0)
:: [ PASS ] :: Command 'systemctl start cloud-final' (Expected 0, got 0)
~~ REBOOT ~~
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ LOG ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ FAIL ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 1)
:: [ FAIL ] :: Command 'cat /proc/cmdline | grep nohz=on' (Expected 0, got 1)
:: [ FAIL ] :: Command 'cat /proc/cmdline | grep nohz_full=1' (Expected 0, got 1)
:: [ FAIL ] :: Command 'cat /proc/cmdline | grep rcu_nocbs=1' (Expected 0, got 1)
:: [ FAIL ] :: Command 'cat /proc/cmdline | grep tuned.non_isolcpus=' (Expected 0, got 1)
:: [ FAIL ] :: Command 'cat /proc/cmdline | grep intel_pstate=disable' (Expected 0, got 1)
:: [ FAIL ] :: Command 'cat /proc/cmdline | grep nosoftlockup' (Expected 0, got 1)
:: [ LOG ] :: Duration: 1s
:: [ LOG ] :: Assertions: 0 good, 7 bad
:: [ FAIL ] :: RESULT: Test
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0879 |