Bug 1469258 - cpu-partitioning: CPUs still isolated after changing profile
cpu-partitioning: CPUs still isolated after changing profile
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: tuned (Show other bugs)
7.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Jaroslav Škarvada
Tereza Cerna
: Patch, Upstream
Depends On:
Blocks: 1394932 1538745 TUNED-7.5-REBASE
  Show dependency treegraph
 
Reported: 2017-07-10 14:26 EDT by Luiz Capitulino
Modified: 2018-04-10 12:04 EDT (History)
6 users (show)

See Also:
Fixed In Version: tuned-2.9.0-0.1.rc1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1538745 (view as bug list)
Environment:
Last Closed: 2018-04-10 12:04:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0879 normal SHIPPED_LIVE tuned bug fix and enhancement update 2018-04-10 09:39:08 EDT

  None (edit)
Description Luiz Capitulino 2017-07-10 14:26:13 EDT
Description of problem:

Sometimes RHEL7 runs dracut after booting into a new kernel for the first time. If this happens when the cpu-partitioning profile is applied, the systemd configuration file /etc/systemd/system.conf will slip into the initrd image. This will cause cpu-partitioning systemd configuration, such as CPU isolation, to be in effect even when changing profiles or if tuned is stopped or even removed from the system.

This problem can be generalized by assuming that users may re-generate their initrd images at any time.


Version-Release number of selected component (if applicable): tuned-2.8.0-5.el7.noarch


How reproducible:


Steps to Reproduce:
1. Setup and activate cpu-partitioning profile
2. Run dracut or install a new kernel and reboot
3. Change to a different profile or stop tuned
4. Reboot and check that CPUs are still isolated
Comment 2 Jaroslav Škarvada 2017-08-28 12:04:31 EDT
I am afraid we cannot resolve this in Tuned. There is nothing like omit_file dracut configuration, so we cannot prevent it from embedding the Tuned added configuration into the main initrd. I think we could either open dracut RFE bugzilla requesting such option, or just document this problem.
Comment 3 Luiz Capitulino 2017-08-28 15:09:09 EDT
Would it be possible to print a message like "you should regenerate your initrd" when disabling the cpu-partitioning profile? I know this looks a bit silly, but at least we told the user what to do.

The problem with adding a configuration like omit_file in dracut is that we'll be disallowing the user from doing a configuration change that may be valid for s/he setup.
Comment 4 Jaroslav Škarvada 2017-08-29 05:10:38 EDT
(In reply to Luiz Capitulino from comment #3)
> Would it be possible to print a message like "you should regenerate your
> initrd" when disabling the cpu-partitioning profile? I know this looks a bit
> silly, but at least we told the user what to do.
> 
This shouldnt' be problem.

> The problem with adding a configuration like omit_file in dracut is that
> we'll be disallowing the user from doing a configuration change that may be
> valid for s/he setup.

I meant omit just Tuned configuration files, e.g.:

/etc/systemd/system.conf.d/05tuned.conf (which we currently don't ship, but we could easily switch to it from /etc/systemd/system.conf), and
/usr/lib/dracut/hooks/pre-udev/00-tuned-pre-udev.sh
Comment 5 Jaroslav Škarvada 2017-08-29 07:54:43 EDT
(In reply to Jaroslav Škarvada from comment #4)
> (In reply to Luiz Capitulino from comment #3)
> > Would it be possible to print a message like "you should regenerate your
> > initrd" when disabling the cpu-partitioning profile? I know this looks a bit
> > silly, but at least we told the user what to do.
> > 
> This shouldnt' be problem.
> 
Upstream commit:
https://github.com/redhat-performance/tuned/commit/d2c170a42e2c823f3c67c609fe528ffccfee5bce
Comment 6 Fedora Update System 2017-10-13 10:21:10 EDT
tuned-2.9.0-0.1.rc1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-d9c6b990df
Comment 8 Fedora Update System 2017-10-13 18:25:27 EDT
tuned-2.9.0-0.1.rc1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-d9c6b990df
Comment 9 Fedora Update System 2017-10-13 19:25:18 EDT
tuned-2.9.0-0.1.rc1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-5f0849d207
Comment 10 Fedora Update System 2017-10-29 17:07:24 EDT
tuned-2.9.0-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-0e45ce4685
Comment 11 Fedora Update System 2017-10-29 17:13:29 EDT
tuned-2.9.0-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-c30e9bd1ea
Comment 12 Joe Mario 2017-12-18 16:10:21 EST
Hi Jaroslav:

Does "ON_QA" mean the need to get it backported to the RHEL 7.4z stream 
is no longer a priority?

We just got bit by this bug on RHEL 7.4, costing lots of testing hours 
because cpus were still isolated during our throughput-performance work.
No one expects throughput-performance to have isolated cpus.

As we work with customers on performance, they love the cpu-partitioning
profile.  I expect some of them will be hitting it as well.

What do we need to do to get the commit from comments #4 & #5 above to be backported into RHEL 7.4z?

Thank you.
Joe
Comment 13 Jaroslav Škarvada 2017-12-19 03:12:09 EST
(In reply to Joe Mario from comment #12)
> Hi Jaroslav:
> 
> Does "ON_QA" mean the need to get it backported to the RHEL 7.4z stream 
> is no longer a priority?
> 
> We just got bit by this bug on RHEL 7.4, costing lots of testing hours 
> because cpus were still isolated during our throughput-performance work.
> No one expects throughput-performance to have isolated cpus.
> 
> As we work with customers on performance, they love the cpu-partitioning
> profile.  I expect some of them will be hitting it as well.
> 
> What do we need to do to get the commit from comments #4 & #5 above to be
> backported into RHEL 7.4z?
> 
> Thank you.
> Joe

Reply in PM.
Comment 15 Luiz Capitulino 2018-01-23 09:05:45 EST
Jaroslav,

I'm verifying this BZ. However, I don't see the message printed when I have the cpu-partitioning profile active and stop tuned or change to a different profile. So, has this been fixed?

In any case, I've been thinking that it would be a better solution to run "dracut -f" from cpu-partitioning's stop() method in script.sh. This way we guarantee that users won't get this issue. Would this be feasible to do?
Comment 16 Jaroslav Škarvada 2018-01-25 05:02:10 EST
(In reply to Luiz Capitulino from comment #15)
> Jaroslav,
> 
> I'm verifying this BZ. However, I don't see the message printed when I have
> the cpu-partitioning profile active and stop tuned or change to a different
> profile. So, has this been fixed?

At the moment the message is printed only to the log. There is currently no mechanism how to pass the stop messages to the controlling client (e.g. to tuned-adm) to write such messages to the console. It would require extension of the DBus API. Of course, we can do it.

> 
> In any case, I've been thinking that it would be a better solution to run
> "dracut -f" from cpu-partitioning's stop() method in script.sh. This way we
> guarantee that users won't get this issue. Would this be feasible to do?

The 'dracut -f' can take significant time to finish, which could under some circumstances (big initrd, slow/loaded machine) cause systemd service/unit timeout to occur.

I think the most robust for the service itself is just a message. Feel free to clone this BZ or file new BZ proposing addition of the communication interface between the client/server which would allow showing informal messages on the console.
Comment 17 Luiz Capitulino 2018-01-25 09:48:51 EST
I don't want to be pedantic, but I don't think that having this message in the logs is good enough: what we want is to let the user know right away that s/he needs to do something right now. I don't think a message in the log qualifies.

I think we have to move this back to opened and rethink the solution for this BZ,  independently of the solution we choose and even if this means moving this to 7.5.

Do you agree? Can I re-open it?
Comment 18 Jaroslav Škarvada 2018-01-25 11:45:30 EST
(In reply to Luiz Capitulino from comment #17)
> I don't want to be pedantic, but I don't think that having this message in
> the logs is good enough: what we want is to let the user know right away
> that s/he needs to do something right now. I don't think a message in the
> log qualifies.
> 
> I think we have to move this back to opened and rethink the solution for
> this BZ,  independently of the solution we choose and even if this means
> moving this to 7.5.
> 
> Do you agree? Can I re-open it?

We cannot extend the API for 7.5, because it's too late, devel phase is over. The bug is already referenced in changelog and errata. The message in the log is slight improvement in comparison to the previous state and is something which can be easily tested. That's why I would prefer cloning this bugzilla to 7.6 - all history will stay preserved and it will get new number and let's call it new feature/improvement.

Otherwise we would have to drop this bug from the errata and postpone it to 7.6. But unfortunately we cannot update already released changelog. That's why I would prefer the former approach.
Comment 19 Luiz Capitulino 2018-01-25 12:45:51 EST
Fair enough, just cloned it: bug 1538745.
Comment 20 Luiz Capitulino 2018-01-25 12:49:13 EST
This BZ is verified: when switching from the cpu-partitioning profile I see in the tuned log:

tuned.log.1:2018-01-23 08:49:53,094 INFO     tuned.plugins.plugin_systemd: you may need to manualy run 'dracut -f' to update the systemd configuration in initrd image
Comment 21 Tereza Cerna 2018-01-30 13:22:28 EST
Thank you Luiz for your testing. There is reproducer how to get info message in log file:

======================================================
Verified in:
    tuned-2.9.0-1.el7.noarch
    tuned-profiles-cpu-partitioning-2.9.0-1.el7.noarch
PASS
======================================================

# echo > /var/log/tuned/tuned.log
# echo 'isolated_cores=1' > /etc/tuned/cpu-partitioning-variables.conf
# tuned-adm profile cpu-partitioning
# tuned-adm active
Current active profile: cpu-partitioning
# dracut -f &
# reboot
# tuned-adm profile balanced 
# cat /var/log/tuned/tuned.log | grep dracut
2018-01-30 13:20:18,707 INFO     tuned.plugins.plugin_systemd: you may need to manualy run 'dracut -f' to update the systemd configuration in initrd image
Comment 24 errata-xmlrpc 2018-04-10 12:04:16 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0879

Note You need to log in before you can comment on or make changes to this bug.