Bug 1538745
Summary: | cpu-partitioning: CPUs still isolated after changing profile | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Luiz Capitulino <lcapitulino> |
Component: | tuned | Assignee: | Ondřej Lysoněk <olysonek> |
Status: | CLOSED ERRATA | QA Contact: | Dominik Rehák <drehak> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | drehak, jeder, jmario, jskarvad, lcapitulino, olysonek, psklenar, tcerna |
Target Milestone: | rc | Keywords: | Patch, Upstream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | tuned-2.10.0-0.1.rc1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1469258 | Environment: | |
Last Closed: | 2018-10-30 10:48:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1469258 | ||
Bug Blocks: | 1394932, 1467576 |
Description
Luiz Capitulino
2018-01-25 17:44:13 UTC
Fixing keywords. A patch for this has not yet been created. It should be fixed by following upstream commit: https://github.com/redhat-performance/tuned/commit/6c1df704ce65de429b90bd99102188be39a463d0 Hi Jaroslav: I'm trying to understand how the above commit will help. I do see the above commit adds a logging infrastructure to tuned, which is good. But I'm wondering how it would help in a scenario like this: a) User is running with the cpu-partitioning profile. b) User installs new kernel and new microcode from Red Hat. c) The new microcode rpm calls "dracut -f" which modifies the default system cpu affinity in /etc/systemd/system.conf d) Currently, the only way to restore the default cpu system affinity is for the user to edit and modify that file. How does the above tuned patch alert the user to avoid the above use case? (In reply to Joe Mario from comment #4) > But I'm wondering how it would help in a scenario like this: > > a) User is running with the cpu-partitioning profile. > b) User installs new kernel and new microcode from Red Hat. > c) The new microcode rpm calls "dracut -f" which modifies the default > system cpu affinity in /etc/systemd/system.conf I'm not sure what you mean here. As I understand it, dracut will not modify /etc/systemd/system.conf. It will only copy that file to the initrd image. > d) Currently, the only way to restore the default cpu system affinity > is for the user to edit and modify that file. Changes made by Tuned to /etc/systemd/system.conf on the root filesystem should be reverted when a different profile is applied or Tuned is stopped. Does that not happen? > How does the above tuned patch alert the user to avoid the above > use case? Assume Tuned is running and the cpu-partitioning profile is applied. When the user switches to a different profile, the changes to /etc/systemd/system.conf made by Tuned will be reverted. This reversion will not propagate to the initrd image, but the user will be notified that they should run dracut to push the changes to the initrd image, e.g: # tuned-adm profile cpu-partitioning # tuned-adm profile throughput-performance CONSOLE tuned.plugins.plugin_systemd: you may need to manualy run 'dracut -f' to update the systemd configuration in initrd image The user will not see the message if they do something like # systemctl stop tuned && systemctl disable tuned but that is expected. The message will be in /var/log/tuned/tuned.log though. What I forgot to add in the above commit is printing the message when 'tuned-adm off' is executed. I'll add that. > I'm not sure what you mean here. As I understand it, dracut will not modify
> /etc/systemd/system.conf. It will only copy that file to the initrd image.
Hi Ondrej:
I'm wondering if you missed the use case and severity of this problem.
Here is what we're seeing happen:
1) Set your tuned to cpu-partitioning with a several cpus selected to
be isolated (in /etc/tuned/cpu-partitioning-variables.conf),
and reboot.
2) Now with tuned set to cpu-partitioning, install new microcode as our
customers are doing with the Spectre-Meltdown fixes. To simplify this
step, all you need to do is run "dracut -f". Reboot.
3) Now set your tuned back to some other tuned profile, like
throughput-performance or network-latency, or whatever. Reboot.
4) After you reboot, look at the affinity for all user processes.
For example: "cat /proc/$$/status |grep Cpu".
You will see all the processes still have the isolated affinitity
from the /etc/tuned/cpu-partitioning-variables.conf file even though
you are no longer set to cpu-partitioning.
5) Even if you install new kernels, they will all inherit the restricted
cpu-parititoning affinity from that conf file (even though you're
set to some other tuned like throughput-performance).
6) The only way I've been able to fix this is to edit /etc/systemd/system.conf.
Given the decision for RHEL to once again ship microcode for Spectre-Meltdown
CVE fixes, our customers running cpu-partitioning will hit the above
bug much more often.
If I'm understanding the changes made for this BZ, it sounds like they're
not addressing the problem that is giving customers the biggest problem.
Please confirm you're understanding this reply.
Thank you.
Joe
[Added info to the my previous reply] From memory, here is what I think is happening. a) cpu-partitioning modifies /etc/systemd/system.conf's Affinity value. b) dracut reads that Affinity value and updates the current kernel's initrd image. If it happens to be the modified value from cpu-partitioning, then the problem just got created. Step "b" is where the damage is done because now the system's default affinity (even if you set tuned to something else) is the restricted tuned from when cpu-partitioning was in place. Editing /etc/systemd/system.conf and rerunning "dracut -f" is the only way I know to fix this. Joe (In reply to Joe Mario from comment #6) > 6) The only way I've been able to fix this is to edit > /etc/systemd/system.conf. Are you saying the /etc/systemd/system.conf file on the root filesystem contains an (uncommented) CPUAffinity option even after you set the throughput-performance profile? That should have been automatically deleted when you changed the profile. I cannot reproduce this - if I run 'dracut -f' after changing to throughput-performance and reboot, the CPUs are no longer isolated. What tuned version are you using? (In reply to Joe Mario from comment #7) > Editing /etc/systemd/system.conf and rerunning "dracut -f" is the only > way I know to fix this. If editing /etc/systemd/system.conf is necessary, then that sounds like a bug in Tuned. On the other hand, it is expected that running 'dracut -f' after changing the profile is necessary. So far the consensus seemed to be that informing the user about the need to run that command would be a sufficient solution. If you think that is not enough, we may want to file that RFE for dracut as suggested here: https://bugzilla.redhat.com/show_bug.cgi?id=1469258#c2 > Are you saying the /etc/systemd/system.conf file on the root
> filesystem contains an (uncommented) CPUAffinity option even after you
> set the throughput-performance profile?
Hi Ondrej:
I think you missed a reboot.
These are the steps, (from memory but I've hit this problem many times):
1) Set tuned cpu-partitioning.
2) Reboot
3) Run "dracut -f"
4) Reboot
5) Set tuned throughput-performance
6) Reboot
7) Check your process affinity (cat /proc/$$/status |grep Cpu")
In step "7", even though your tuned is throughput-performance, your
process affinity will be that from the previous cpu-partitioning.
Let me know if this does not reproduce for you.
Joe
(In reply to Joe Mario from comment #10) > > Are you saying the /etc/systemd/system.conf file on the root > > filesystem contains an (uncommented) CPUAffinity option even after you > > set the throughput-performance profile? > > Hi Ondrej: > > I think you missed a reboot. I did not. > These are the steps, (from memory but I've hit this problem many times): > 1) Set tuned cpu-partitioning. > 2) Reboot > 3) Run "dracut -f" > 4) Reboot > 5) Set tuned throughput-performance > 6) Reboot > 7) Check your process affinity (cat /proc/$$/status |grep Cpu") > > In step "7", even though your tuned is throughput-performance, your > process affinity will be that from the previous cpu-partitioning. Yes, I can reproduce this. But that is a separate issue. What I'm asking you, is whether you see the *CPUAffinity option* in the /etc/systemd/system.conf file on the *root filesystem* after you set the throughput-performance profile. That should not happen. Does it happen for you? If you do the following: 1) Set tuned cpu-partitioning. 2) Reboot 3) Run "dracut -f" 4) Reboot 5) Set tuned throughput-performance 6) Run "dracut -f" 7) Reboot 8) Check your process affinity (cat /proc/$$/status |grep Cpu") Then in step 8, you should see the correct affinity. Do you see this behaviour? Notice step 6, 'Run "dracut -f"', in the steps I provided. Hi Ondrej: You are correct. I just redid the steps that caused us problems and verified the step to rerun dracut should work. I tried multiple tuned/kernel-install/microcode-install/reboot combinations and verified that in all cases setting a different profile (away from cpu-partitioning) and then running dracut did resolve it. I apologize for the distraction of yesterday's previous comments I made. I do have a question, since I'm not running the new patched tuned yet. Will the new message to manually run dracut be emitted to the console only or also to the user's terminal? I ask that because of the recent CVE kernels we announced in early May, (which also require microcode). RHEL will once again begin shipping microcode. That means there will be lots of customers who will be installing these kernels and will be updating their microcode (and implictly calling dracut). If the tuned message to rerun dracut isn't "put in front of their face", those using cpu-partitioning will find themselves running some profile (not cpu-partitioning) and will have isolated cpus for all the user processes on their box. In summary, the steps recommended in the message do appear to work, (thank you and sorry for the noise I caused). We just need to make sure the message emitted gets seen. Thank you. Joe I'm glad we're on the same page now :). (In reply to Joe Mario from comment #13) > I do have a question, since I'm not running the new patched tuned yet. > Will the new message to manually run dracut be emitted to the console > only or also to the user's terminal? It will be printed to the terminal where the user runs tuned-adm, for example: $ tuned-adm profile throughput-performance CONSOLE tuned.plugins.plugin_systemd: you may need to manualy run 'dracut -f' to update the systemd configuration in initrd image It also goes to the Tuned log file (/var/log/tuned/tuned.log). It does *not* go to the *system console*. Perhaps the name of the log level, "CONSOLE", is a bit misleading... I can see the message in my terminal, but there's a small problem. The message reads: CONSOLE tuned.plugins.plugin_systemd: you may need to manualy run 'dracut -f' to update the systemd configuration in initrd image But it should read instead: CONSOLE tuned.plugins.plugin_systemd: you may need to manualy run 'dracut -f && reboot' to update the systemd configuration in initrd image Or: CONSOLE tuned.plugins.plugin_systemd: you may need to manualy run 'dracut -f' and reboot in order to update the systemd configuration in initrd image Since this is tiny change, I don't think we need a BZ for it as long as you do it sooner rather than later. Other than that, running dracut -f and rebooting does fix the problem for me. As Joe's issue is now clarified, this BZ is verified. PS: one limitation of this solution is that people running tuned-adm from scripts may not see the message and nothing wrong will happen since tuned-adm doesn't fail. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3172 |