Bug 1475571 - tuned fails to update boot option parameter via cloud-init for KVM instance
tuned fails to update boot option parameter via cloud-init for KVM instance
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: tuned (Show other bugs)
7.3
x86_64 Linux
urgent Severity low
: rc
: ---
Assigned To: Ondřej Lysoněk
Tereza Cerna
: Patch, Upstream, ZStream
Depends On:
Blocks: TUNED-7.5-REBASE 1485946 1488369 1507564
  Show dependency treegraph
 
Reported: 2017-07-26 18:39 EDT by jianzzha
Modified: 2018-04-10 12:04 EDT (History)
10 users (show)

See Also:
Fixed In Version: tuned-2.9.0-0.1.rc1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1507564 (view as bug list)
Environment:
Last Closed: 2018-04-10 12:04:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
tuned.log (29.53 KB, text/plain)
2017-08-16 14:34 EDT, jianzzha
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0879 normal SHIPPED_LIVE tuned bug fix and enhancement update 2018-04-10 09:39:08 EDT

  None (edit)
Description jianzzha 2017-07-26 18:39:20 EDT
Description of problem:
tuned fails to update boot option parameter via cloud-init script for KVM instance: from cloud-init script, install tuned-profiles-cpu-partitioning, update isolated_cores in /etc/tuned/cpu-partitioning-variables.conf, apply profile cpu-partitioning, then reboot. after first reboot, tuned parameter doesn't show up in /proc/cmdline. It will need 2nd reboot.


How reproducible:
reproducible.


Steps to Reproduce:
1. In cloud-init script do following steps: yum  install tuned-profiles-cpu-partitioning, update isolated_cores in /etc/tuned/cpu-partitioning-variables.conf; tuned-adm profile cpu-partitioning; reboot
2. when instance boots up, cat /proc/cmdline. tuned cpu-partitioning parameter not exist.
3. reboot. cat /proc/cmdline. tuned cpu-partitioning parameter show up.

Actual results:
tuned boot parameter show up after 2nd reboot

Expected results:
tuned boot parameter kick in during 1st reboot

Additional info:
Tested RHEL 7.3 and 7.4.
Comment 2 Ondřej Lysoněk 2017-08-15 09:43:38 EDT
I can't reproduce it. I ran the following on a fresh RHEL-7.4 machine:

# yum  install tuned-profiles-cpu-partitioning
# echo 'isolated_cores=0' > /etc/tuned/cpu-partitioning-variables.conf
# tuned-adm profile cpu-partitioning

but after reboot the kernel command line changes are applied. I have
# cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.el7.x86_64 root=UUID=a0d61a4b-0438-49b6-91be-859938ccf14d ro console=tty0 console=ttyS0,115200 crashkernel=auto net.ifnames=0 rhgb quiet LANG=en_US.UTF-8 nohz=on nohz_full=0 rcu_nocbs=0 tuned.non_isolcpus=0000000e intel_pstate=disable nosoftlockup

You mentioned cloud-init. What is it supposed to do? When and how do you run it?

It would help if you'd provide the logs of Tuned in debug mode. Make sure Tuned is stopped
# systemctl stop tuned

and try running Tuned as
# tuned -l -P -D -d

Then run your reproducer and provide the log file /var/log/tuned/tuned.log.

Thanks.
Comment 3 Ondřej Lysoněk 2017-08-15 09:46:00 EDT
Also, what is the output of the following?
# type tuned-adm

Just to make sure you don't have tuned-adm aliased to 'tuned-adm --async' or something.
Comment 4 Ondřej Lysoněk 2017-08-16 06:21:54 EDT
One more thing. Was Tuned already running before you ran 'tuned-adm profile cpu-partitioning'? If not, then it's probably related to
https://github.com/redhat-performance/tuned/issues/23

and
https://bugzilla.redhat.com/show_bug.cgi?id=1443142
Comment 5 jianzzha 2017-08-16 11:33:24 EDT
(In reply to Ondřej Lysoněk from comment #2)
> I can't reproduce it. I ran the following on a fresh RHEL-7.4 machine:
> 
> # yum  install tuned-profiles-cpu-partitioning
> # echo 'isolated_cores=0' > /etc/tuned/cpu-partitioning-variables.conf
> # tuned-adm profile cpu-partitioning
> 
> but after reboot the kernel command line changes are applied. I have
> # cat /proc/cmdline 
> BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.el7.x86_64
> root=UUID=a0d61a4b-0438-49b6-91be-859938ccf14d ro console=tty0
> console=ttyS0,115200 crashkernel=auto net.ifnames=0 rhgb quiet
> LANG=en_US.UTF-8 nohz=on nohz_full=0 rcu_nocbs=0 tuned.non_isolcpus=0000000e
> intel_pstate=disable nosoftlockup
> 
> You mentioned cloud-init. What is it supposed to do? When and how do you run
> it?
> 
> It would help if you'd provide the logs of Tuned in debug mode. Make sure
> Tuned is stopped
> # systemctl stop tuned
> 
> and try running Tuned as
> # tuned -l -P -D -d
> 
> Then run your reproducer and provide the log file /var/log/tuned/tuned.log.
> 
> Thanks.

Cloud-init is a tool that is used to customize an cloud instance when it is first booted. 

It is possible that certain required service by tuned is not started yet when cloud-init is run. I will have collect the required log and check if that's the case
Comment 6 jianzzha 2017-08-16 14:32:06 EDT
Here is related console log during cloud-init phase,
[   58.873065] cloud-init[5444]: + systemctl stop tuned
[   59.899984] cloud-init[5444]: + tuned -l -P -D -d
[   59.990447] cloud-init[5444]: + yum install -y tuned-profiles-cpu-partitioning
....

[   62.414081] cloud-init[5444]: + tuned-adm profile cpu-partitioning
[   87.505351] cloud-init[5444]: DBus call to Tuned daemon failed
[  178.839605] NMI watchdog: disabled (cpu0): hardware events not enabled
[  178.840422] NMI watchdog: Shutting down hard lockup detector on all cpus
[  178.497120] cloud-init[5444]: Trying to (re)start tuned...
[  178.497500] cloud-init[5444]: Tuned (re)start failed, you need to (re)start tuned by hand for changes to apply.


so looks like tuned isn't running even though the "tuned -l -P -D -d" was applied earlier
Comment 7 jianzzha 2017-08-16 14:34 EDT
Created attachment 1314273 [details]
tuned.log
Comment 8 Ondřej Lysoněk 2017-08-17 03:35:22 EDT
Btw, what versions of packages 'tuned' and 'tuned-profiles-cpu-partitioning' do you have? This is probably not related to this bug, but it seems you have some older Tuned version, which doesn't support the 'priority' option:

2017-08-16 13:29:09,634 WARNING  tuned.plugins.base: Unknown option 'priority' for plugin 'ScriptPlugin'.

There might be some packaging error which caused that tuned-profiles-cpu-partitioning didn't pull in the correct version of tuned as dependency.
Comment 9 Ondřej Lysoněk 2017-08-17 06:48:10 EDT
Actually it was, but the tuned-adm's request to change the profile timed out. And after that, tuned-adm assumed that Tuned was not running, so it attempted to do 'systemctl restart Tuned. So after that, two instances of Tuned were in fact running (because systemd didn't recognize the manually started Tuned process).

From the logs it seems the following happened:
1. The previously running Tuned process stopped after 'systemctl stop tuned'
2. A new Tuned process was created after 'tuned -l -P -D -d' was issued and the daemon started up and applied the virtual-guest profile
3. 'tuned-adm profile cpu-partitioning' was issued. In resonse, the Tuned daemon issued a request to Polkit in order to check authorization of the request.
4. The tuned-adm's request timed out, because the Tuned daemon was taking a long time to respond (the 25 second default dbus time out). And the Tuned daemon was taking a long time to respond, because it wasn't receiving any reply from Polkit, and its request timed out as well (also after the 25 second default dbus time out). So both tuned-adm's and Tuned daemon's requests timed out at around the same time.
5. Then the two following things happend at around the same time:
   I.  tuned-adm tried the fallback method of activating the requested profile, which is 'systemctl restart tuned'. So a new Tuned process started up.
   II. After Tuned daemon's request to Polkit timed out, it tried a fallback authorization method, which succeeded. And the daemon processed the request to switch the profile to cpu-partitioning.

After that, the two Tuned damon instances probably collided somehow, which resulted in the traceback in the log and the message from tuned-adm 'Tuned (re)start failed, you need to (re)start tuned by hand for changes to apply'.

Maybe we should learn from this and change the default timeout either in tuned-adm or the Tuned daemon, so we don't get these synchronous timeouts.

But all of this is a result of running Tuned manually, which was a bad idea. I'm still not sure what is causing your original issue.
Comment 10 Ondřej Lysoněk 2017-08-17 07:14:33 EDT
I suspect that the synchronous timeouts are in fact the issue here. In your script, can you make sure that Polkit is running prior to calling 'tuned-adm profile cpu-partitioning'?

We should definitely change the timeouts though. The timeout in tuned-adm must be longer than the time out of Tuned daemon's calls to Polkit. If the above doesn't help, I'll provide you with a test build which changes the timeouts.
Comment 12 jianzzha 2017-08-29 10:08:55 EDT
(In reply to Ondřej Lysoněk from comment #10)
> I suspect that the synchronous timeouts are in fact the issue here. In your
> script, can you make sure that Polkit is running prior to calling 'tuned-adm
> profile cpu-partitioning'?
> 
> We should definitely change the timeouts though. The timeout in tuned-adm
> must be longer than the time out of Tuned daemon's calls to Polkit. If the
> above doesn't help, I'll provide you with a test build which changes the
> timeouts.

Yes Polkit is running and I have following in the cloud-init script to make sure of it,

    if pgrep polkitd; then
        echo "polkitd is running"
        tuned-adm profile cpu-partitioning
    fi

the if block gets executed.
Comment 14 jianzzha 2017-09-05 12:24:59 EDT
Hi Ondrej, what other info is needed from me?

Please also take a look of a new bug submitted by the QE, I think it is related https://bugzilla.redhat.com/show_bug.cgi?id=1488517
Comment 15 Ondřej Lysoněk 2017-09-08 10:42:30 EDT
(In reply to jianzzha from comment #14)
> Hi Ondrej, what other info is needed from me?

Sorry, I posted some private comments, but didn't realize you don't have the permissions to see them in bugzilla. But the comments are no longer relevant anyway.

> Please also take a look of a new bug submitted by the QE, I think it is
> related https://bugzilla.redhat.com/show_bug.cgi?id=1488517

Thank you very much for this, that helped a lot. I think it is indeed related. The bug should be fixed by this pull request:
https://github.com/redhat-performance/tuned/pull/66
Comment 17 Fedora Update System 2017-10-13 10:21:19 EDT
tuned-2.9.0-0.1.rc1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-d9c6b990df
Comment 19 Fedora Update System 2017-10-13 18:25:34 EDT
tuned-2.9.0-0.1.rc1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-d9c6b990df
Comment 20 Fedora Update System 2017-10-13 19:25:26 EDT
tuned-2.9.0-0.1.rc1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-5f0849d207
Comment 22 Fedora Update System 2017-10-29 17:07:33 EDT
tuned-2.9.0-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-0e45ce4685
Comment 23 Fedora Update System 2017-10-29 17:13:38 EDT
tuned-2.9.0-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-c30e9bd1ea
Comment 24 Oneata Mircea Teodor 2017-10-30 10:11:02 EDT
Hello,
We need GSS or PM Approved in order to clone.
This is part of Core Services - Infrastructure Services, so it should be Filip Krska or Steve Almy.
Comment 27 Tereza Cerna 2017-12-19 07:05:15 EST
Verified in:
    tuned-2.9.0-1.el7.noarch
    tuned-profiles-cpu-partitioning-2.9.0-1.el7.noarch
    cloud-init-0.7.9-20.el7.x86_64
PASS

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   PASS   ] :: Command 'cat > /etc/cloud/cloud.cfg.d/a.cfg << EOF
:: [   PASS   ] :: runcmd:
:: [   PASS   ] ::  - [ systemctl, enable, tuned ]
:: [   PASS   ] ::  - echo 'isolated_cores=1' > /etc/tuned/cpu-partitioning-variables.conf
:: [   PASS   ] ::  - [ tuned-adm, profile, cpu-partitioning ]
:: [   PASS   ] ::  - [ reboot ]
:: [   PASS   ] :: EOF' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl enable cloud-init' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl enable cloud-config' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl enable cloud-final' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl start cloud-init' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl start cloud-config' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl start cloud-final' (Expected 0, got 0)

~~ REBOOT ~~

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   PASS   ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 0)
:: [   PASS   ] :: Command 'cat /proc/cmdline | grep nohz=on' (Expected 0, got 0)
:: [   PASS   ] :: Command 'cat /proc/cmdline | grep nohz_full=1' (Expected 0, got 0)
:: [   PASS   ] :: Command 'cat /proc/cmdline | grep rcu_nocbs=1' (Expected 0, got 0)
:: [   PASS   ] :: Command 'cat /proc/cmdline | grep tuned.non_isolcpus=' (Expected 0, got 0)
:: [   PASS   ] :: Command 'cat /proc/cmdline | grep intel_pstate=disable' (Expected 0, got 0)
:: [   PASS   ] :: Command 'cat /proc/cmdline | grep nosoftlockup' (Expected 0, got 0)
:: [   LOG    ] :: Duration: 1s
:: [   LOG    ] :: Assertions: 7 good, 0 bad
:: [   PASS   ] :: RESULT: Test


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reproduced in:
    tuned-2.8.0-5.el7.noarch
    tuned-profiles-cpu-partitioning-2.8.0-5.el7.noarch
    cloud-init-0.7.9-20.el7.x86_64
FAIL

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   PASS   ] :: Command 'cat > /etc/cloud/cloud.cfg.d/a.cfg << EOF
:: [   PASS   ] :: runcmd:
:: [   PASS   ] ::  - [ systemctl, enable, tuned ]
:: [   PASS   ] ::  - echo 'isolated_cores=1' > /etc/tuned/cpu-partitioning-variables.conf
:: [   PASS   ] ::  - [ tuned-adm, profile, cpu-partitioning ]
:: [   PASS   ] ::  - [ reboot ]
:: [   PASS   ] :: EOF' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl enable cloud-init' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl enable cloud-config' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl enable cloud-final' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl start cloud-init' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl start cloud-config' (Expected 0, got 0)
:: [   PASS   ] :: Command 'systemctl start cloud-final' (Expected 0, got 0)

~~ REBOOT ~~

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   FAIL   ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 1)
:: [   FAIL   ] :: Command 'cat /proc/cmdline | grep nohz=on' (Expected 0, got 1)
:: [   FAIL   ] :: Command 'cat /proc/cmdline | grep nohz_full=1' (Expected 0, got 1)
:: [   FAIL   ] :: Command 'cat /proc/cmdline | grep rcu_nocbs=1' (Expected 0, got 1)
:: [   FAIL   ] :: Command 'cat /proc/cmdline | grep tuned.non_isolcpus=' (Expected 0, got 1)
:: [   FAIL   ] :: Command 'cat /proc/cmdline | grep intel_pstate=disable' (Expected 0, got 1)
:: [   FAIL   ] :: Command 'cat /proc/cmdline | grep nosoftlockup' (Expected 0, got 1)
:: [   LOG    ] :: Duration: 1s
:: [   LOG    ] :: Assertions: 0 good, 7 bad
:: [   FAIL   ] :: RESULT: Test
Comment 30 errata-xmlrpc 2018-04-10 12:04:16 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0879

Note You need to log in before you can comment on or make changes to this bug.