Bug 1613478
Summary: | Tuned hangs with cpu-partitioning and offlined cpu(s) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Joe Mario <jmario> | |
Component: | tuned | Assignee: | Jaroslav Škarvada <jskarvad> | |
Status: | CLOSED ERRATA | QA Contact: | Tereza Cerna <tcerna> | |
Severity: | unspecified | Docs Contact: | Marek Suchánek <msuchane> | |
Priority: | urgent | |||
Version: | 7.6 | CC: | acme, jbastian, jeder, jmario, jolsa, jskarvad, kwalker, mkolaja, mpetlan, olysonek, rhbz, salmy, tcerna | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | tuned-2.10.0-2.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: |
Previously, setting the cpu-partitioning Tuned profile caused the tuned service to become unresponsive if a CPU was set offline. With this update, the problem has been fixed, and tuned no longer hangs after selecting cpu-partitioning when a CPU is offline.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1613832 1613950 1613951 (view as bug list) | Environment: | ||
Last Closed: | 2018-10-30 10:50:19 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1613950, 1613951 |
Description
Joe Mario
2018-08-07 15:59:46 UTC
From devel point of view no problem to backport to 7.5.z and 7.4.z. To proceed further I need qa_ack, pm_ack and Z stream clones. Hi, I'm not sure if I'm able to test 3 errata and have it ready for release on 14th August. There is so much work, I should test this bug, do package testing and regression testing. It takes hours.. Hi, is mentioned reproduced in bug description complete? I did these steps and I was not able to reproduce it. Maybe some step is missing? # echo 'isolated_cores=2-3' > /etc/tuned/cpu-partitioning-variables.conf # echo 0 > /sys/devices/system/cpu/cpu4/online # tuned-adm profile cpu-partitioning ## this should hang, but it did not it # tuned-adm active | grep cpu-partitioning Current active profile: cpu-partitioning # rpm -q tuned{,-profiles-cpu-partitioning} tuned-2.8.0-5.el7_4.2.noarch tuned-profiles-cpu-partitioning-2.8.0-5.el7_4.2.noarch Hi Tereza: This should not need any special boot flag. It fails on a default RHEL 7.5 installation. Here's what I get, and that last cmd, setting it to cpu-partitioning hangs. # rpm -q tuned{,-profiles-cpu-partitioning} tuned-2.9.0-1.el7_5.2.noarch tuned-profiles-cpu-partitioning-2.9.0-1.el7_5.2.noarch # tuned-adm active Current active profile: throughput-performance # egrep '^isolated' /etc/tuned/cpu-partitioning-variables.conf isolated_cores=17 # echo 0 > /sys/devices/system/cpu/cpu4/online # tuned-adm profile cpu-partitioning [the above cmd hangs] Hi Jaroslav: Do you have any thoughts on how the above scenario for Tereza did not hang? It looks like he's on RHEL 7.4, which I assumed had the same implementation for reading which cpus were present instead of online. Thanks, Joe I tested in on RHEL-7.5, because of ER#35664 which should be in REL_PREP on Tuesday. Mistake, I tested in on RHEL-7.4.z durint testing of ER#35664. Ok, now I'm having trouble reproducing this too on RHEL-7.4.z: [root@intel-brickland-03 ~]# uname -r 3.10.0-693.37.4.el7.x86_64 [root@intel-brickland-03 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) [root@intel-brickland-03 ~]# rpm -qa tuned{,-profiles-cpu-partitioning} tuned-profiles-cpu-partitioning-2.8.0-5.el7.noarch tuned-2.8.0-5.el7.noarch [root@intel-brickland-03 ~]# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list 0,72 [root@intel-brickland-03 ~]# echo isolated_cores=0-71 >> /etc/tuned/cpu-partitioning-variables.conf [root@intel-brickland-03 ~]# echo 0 > /sys/devices/system/cpu/cpu72/online [ 955.220873] intel_pstate CPU 72 exiting [ 955.239976] Broke affinity for irq 107 [ 955.246524] smpboot: CPU 72 is now offline [root@intel-brickland-03 ~]# tuned-adm profile cpu-partitioning [root@intel-brickland-03 ~]# tuned-adm active Current active profile: cpu-partitioning [root@intel-brickland-03 ~]# echo Still alive Still alive This is reproducible on RHEL-7.5 with tuned-2.9 [root@dell-pet620-01 ~]# uname -r 3.10.0-862.11.6.el7.x86_64 [root@dell-pet620-01 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) [root@dell-pet620-01 ~]# rpm -qa tuned{,-profiles-cpu-partitioning} tuned-2.9.0-1.el7.noarch tuned-profiles-cpu-partitioning-2.9.0-1.el7.noarch [root@dell-pet620-01 ~]# lscpu | grep -B1 On-line CPU(s): 24 On-line CPU(s) list: 0-23 [root@dell-pet620-01 ~]# grep -v ^# /etc/tuned/cpu-partitioning-variables.conf isolated_cores=1,3,5,7,9 [root@dell-pet620-01 ~]# echo 0 > /sys/devices/system/cpu/cpu10/online [root@dell-pet620-01 ~]# tuned-adm profile cpu-partitioning <<<HANG>>> ^C Traceback (most recent call last): File "/usr/sbin/tuned-adm", line 94, in <module> result = admin.action(action_name, **options) File "/usr/lib/python2.7/site-packages/tuned/admin/admin.py", line 75, in action res = self._controller.run() File "/usr/lib/python2.7/site-packages/tuned/admin/dbus_controller.py", line 59, in run self._main_loop.run() File "/usr/lib64/python2.7/site-packages/gi/overrides/GLib.py", line 577, in run raise KeyboardInterrupt KeyboardInterrupt If I downgrade this RHEL-7.5 system to the tuned packages from RHEL-7.4, I can no longer reproduce the hang. [root@dell-pet620-01 ~]# rpm -qa tuned{,-profiles-cpu-partitioning} tuned-2.8.0-5.el7.noarch tuned-profiles-cpu-partitioning-2.8.0-5.el7.noarch [root@dell-pet620-01 ~]# echo 0 > /sys/devices/system/cpu/cpu10/online [root@dell-pet620-01 ~]# tuned-adm profile cpu-partitioning [root@dell-pet620-01 ~]# echo Still Alive Still Alive And if I upgrade to tuned-2.10.0-1.el7 it also works fine, so it seems the hang bug was limited to something in tuned-2.9.0. Nevertheless, the patch is a good idea for RHEL 7.6, so tuned-2.10.0-2.el7 is the right way to go. I guess it's up for debate if we want to fix RHEL 7.4.z or not. Thanks Jeff for the above analysis. When I ran into the tuned hang, it occured on RHEL 7.5. I assumed the problem also existed on RHEL 7.4 (because the tuned there also used the /sys/devices/system/cpu/present file instead of the /sys/devices/system/cpu/online file). There must be something in the RHEL 7.4 tuned version that causes this bug to not trigger. But it still looks like Jaroslav's patch is warranted for RHEL 7.4 since the cpu information in /sys/devices/system/cpu/present is not correct for how tuned used it. Hi, I've tested tps, regression tests, filelists... and all these things are OK. I've seen also patch, except one typo which is commented (so not problematic), this patch is OK. Two or three my test cases uses this code and when these tests were improved according to changes from patch, they worked well. But this bug... I write new test case /CoreOS/tuned/Regression/offlined-cpu-in-profile-cpu-partitioning for testing problem specified in description - tuned hangs when user sets some cpu(s) offlined and selects cpu-partition profile. This should hang on cpu-partitioning selection with old packages, and works when new packages are installed, but see my results: | current result | excepted result ----------------|------------------|------------------- rhel-7.5 (old) | it hang | it hang -> maybe false positive rhel-7.5 (new) | it hang | it works -> fail rhel-7.4 (old) | it works | it hang -> fail rhel-7.4 (new) | it works | it works -> maybe false positive So, from my point of view there are two options: 1] My testing was wrong, @jbastian Can you try to test it again on 7.5? I see that you was able to reproduce it, but please, try to test it again with using tuned-2.9.0-1.el7_5.2 version. 2] My testing was right, this patch works as expected (test cases passed), but it doesn't fix this problem. I don't want to block the 7.5.z release, so if you are satisfied with the patch, I'll finish this erratum and push it to REL_PREP. You can see my results from test case (I can provide full log): ============================================================= RHEL-7.5 (new packages) tuned-2.9.0-1.el7_5.2.noarch.rpm tuned-profiles-cpu-partitioning-2.9.0-1.el7_5.2.noarch.rpm ============================================================= :: [ PASS ] :: Command 'echo 'isolated_cores=2-3' > /etc/tuned/cpu-partitioning-variables.conf' (Expected 0, got 0) :: [ LOG ] :: Offline cpu :: [ PASS ] :: Command 'echo 0 > /sys/devices/system/cpu/cpu4/online' (Expected 0, got 0) :: [ LOG ] :: Select cpu-partitioning profile :: [ LOG ] :: Output of 'tuned-adm profile cpu-partitioning': :: [ LOG ] :: --------------- OUTPUT START --------------- :: [ LOG ] :: Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async. :: [ LOG ] :: --------------- OUTPUT END --------------- :: [ FAIL ] :: Command 'tuned-adm profile cpu-partitioning' (Expected 0, got 1) :: [ PASS ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 0) FAIL: Selection of cpu-partition was not successfull, it hangs... and it shouldn't... ============================================================= RHEL-7.5 (old packages) tuned-2.9.0-1.el7_5.1.noarch.rpm tuned-profiles-cpu-partitioning-2.9.0-1.el7_5.1.noarch.rpm ============================================================= :: [ PASS ] :: Command 'echo 'isolated_cores=2-3' > /etc/tuned/cpu-partitioning-variables.conf' (Expected 0, got 0) :: [ LOG ] :: Offline cpu :: [ PASS ] :: Command 'echo 0 > /sys/devices/system/cpu/cpu4/online' (Expected 0, got 0) :: [ LOG ] :: Select cpu-partitioning profile :: [ LOG ] :: Output of 'tuned-adm profile cpu-partitioning': :: [ LOG ] :: --------------- OUTPUT START --------------- :: [ LOG ] :: Operation timed out after waiting 600 seconds(s), you may try to increase timeout by using --timeout command line option or using --async. :: [ LOG ] :: --------------- OUTPUT END --------------- :: [ FAIL ] :: Command 'tuned-adm profile cpu-partitioning' (Expected 0, got 1) :: [ PASS ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 0) ?? PASS: Selection of cpu-partition was not successfull, it hangs... (that's right) but it is still weird, because it was also hanging with the patch. maybe FALSE POSITIVE ============================================================= RHEL-7.4 (new packages) tuned-2.8.0-5.el7_4.3.noarch.rpm tuned-profiles-cpu-partitioning-2.8.0-5.el7_4.3.noarch.rpm ============================================================= :: [ PASS ] :: Command 'echo 'isolated_cores=2-3' > /etc/tuned/cpu-partitioning-variables.conf' (Expected 0, got 0) :: [ LOG ] :: Offline cpu :: [ PASS ] :: Command 'echo 0 > /sys/devices/system/cpu/cpu4/online' (Expected 0, got 0) :: [ LOG ] :: Select cpu-partitioning profile :: [ LOG ] :: Output of 'tuned-adm profile cpu-partitioning': :: [ LOG ] :: --------------- OUTPUT START --------------- :: [ LOG ] :: --------------- OUTPUT END --------------- :: [ PASS ] :: Command 'tuned-adm profile cpu-partitioning' (Expected 0, got 0) :: [ PASS ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 0) ?? PASS: Selection of cpu-partition works, it didn't hang... (that's right) but it is still weird, because it was not failed without patch. maybe FALSE POSITIVE ============================================================= RHEL-7.4 (old packages) tuned-profiles-cpu-partitioning-2.8.0-5.el7_4.2.noarch tuned-2.8.0-5.el7_4.2.noarch ============================================================= :: [ PASS ] :: Command 'echo 'isolated_cores=2-3' > /etc/tuned/cpu-partitioning-variables.conf' (Expected 0, got 0) :: [ LOG ] :: Offline cpu :: [ PASS ] :: Command 'echo 0 > /sys/devices/system/cpu/cpu4/online' (Expected 0, got 0) :: [ LOG ] :: Select cpu-partitioning profile :: [ LOG ] :: Output of 'tuned-adm profile cpu-partitioning': :: [ LOG ] :: --------------- OUTPUT START --------------- :: [ LOG ] :: --------------- OUTPUT END --------------- :: [ PASS ] :: Command 'tuned-adm profile cpu-partitioning' (Expected 0, got 0) :: [ PASS ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 0) FAIL: Selection of cpu-partition works, it didn't hang..., but it should hang... Your RHEL-7.4 results are good: tuned-2.8.0-* does not hang, even the old version (before the patch). This is what I reported in comment 22 and comment 24 above. So for RHEL-7.4, as long as nothing else failed in regression testing, you can move it to Verified. I'm more concerned about the failed result with RHEL-7.5 and the new packages. Can you point me to the full logs and Beaker job? Hi Tereza: I just logged onto ibm-x3500m4-01.rhts.eng.bos.redhat.com,. And I see she's running with the unpatched tuned. The /usr/lib/python2.7/site-packages/tuned/utils/commands.py still references the /sys/devices/system/cpu/present file. Is that correctly updated? Joe Hi Joe, as I see, patch was applied correctly and the file was changed: with patch (new package): function cpulist_invert references the file /sys/devices/system/cpu/online without patch (old package): function cpulist_invert references the file /sys/devices/system/cpu/present T. Thank you Jaroslav. That's a great reproducer. Jirka Olsa: Do you know anything about the perf python module? If so, can you take a look at this? It's pretty important, because anyone disabling hyperthreads due to the latest CVE, may hit this, which causes tuned to hang (when using the cpu-partitioning profile). If it's not yours, do you know who would know? Thanks, Joe (In reply to Joe Mario from comment #41) > Thank you Jaroslav. > That's a great reproducer. > > Jirka Olsa: > Do you know anything about the perf python module? > > If so, can you take a look at this? It's pretty important, because > anyone disabling hyperthreads due to the latest CVE, may hit this, which > causes tuned to hang (when using the cpu-partitioning profile). > > If it's not yours, do you know who would know? yea, it's me.. sry for delay, but AFAICT I got CC-ed just today there's an issue with perf python mmap interface, I'm brewing RHEL7 build with the fix in here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17917666 not sure how it fixes tuned, but it fixes those valgrind errors for me I'll CC you on upstream patches jirka (In reply to Jiri Olsa from comment #46) > (In reply to Joe Mario from comment #41) > > Thank you Jaroslav. > > That's a great reproducer. > > > > Jirka Olsa: > > Do you know anything about the perf python module? > > > > If so, can you take a look at this? It's pretty important, because > > anyone disabling hyperthreads due to the latest CVE, may hit this, which > > causes tuned to hang (when using the cpu-partitioning profile). > > > > If it's not yours, do you know who would know? > > yea, it's me.. sry for delay, but AFAICT I got CC-ed just today > > there's an issue with perf python mmap interface, > I'm brewing RHEL7 build with the fix in here: > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17917666 > > not sure how it fixes tuned, but it fixes those valgrind errors for me > > I'll CC you on upstream patches > > jirka Thanks, Tereza, could you try your test with the new kernel? Sure, I'll look at it today. > > yea, it's me.. sry for delay, but AFAICT I got CC-ed just today
> >
> > there's an issue with perf python mmap interface,
> > I'm brewing RHEL7 build with the fix in here:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17917666
> >
> > not sure how it fixes tuned, but it fixes those valgrind errors for me
> >
> > I'll CC you on upstream patches
> >
> > jirka
>
> Thanks,
>
> Tereza, could you try your test with the new kernel?
the change is in python's perf.so module (python-perf rpm)
kernel wasn't changed
jirka
Hi, mentioned brew-build https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17917666 is builded for rhel-7.6, but I need to test it on rhel-7.5.z, where I reproduced the problem. Can you build it for this release, please? (In reply to Jiri Olsa from comment #49) > > > yea, it's me.. sry for delay, but AFAICT I got CC-ed just today > > > > > > there's an issue with perf python mmap interface, > > > I'm brewing RHEL7 build with the fix in here: > > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17917666 > > > > > > not sure how it fixes tuned, but it fixes those valgrind errors for me > > > > > > I'll CC you on upstream patches > > > > > > jirka > > > > Thanks, > > > > Tereza, could you try your test with the new kernel? > > the change is in python's perf.so module (python-perf rpm) > kernel wasn't changed > > jirka Sure, but it will have to be released as a kernel errata, because it's build from the kernel srpm. Hi, provided python-perf package in c#46 gived me a positive result of a test case. It looks, that this fix solves this problem. I tried all options: tuned | python-perf | result ----------------------------------------------------------------------------- 2.9.0-1.el7_5.2 (new) | 3.10.0-934.el7perf_python_fix (new) | PASS 2.9.0-1.el7_5.2 (new) | 3.10.0-862.el7 (old) | FAIL 2.9.0-1.el7_5.1 (old) | 3.10.0-934.el7perf_python_fix (new) | FAIL 2.9.0-1.el7_5.1 (old) | 3.10.0-862.el7 (old) | FAIL We can see that only combination of new tuned and new python-perf leads to positive result of a test case. Can we deploy in 7.5.z also fix in python-perf package? ==================================================== Verified in: python-perf-3.10.0-934.el7perf_python_fix.x86_64 tuned-2.9.0-1.el7_5.2.noarch tuned-profiles-cpu-partitioning-2.9.0-1.el7_5.2.noarch PASS ==================================================== [ PASS ] :: Command 'echo 'isolated_cores=2-3' > /etc/tuned/cpu-partitioning-variables.conf' (Expected 0, got 0) [ LOG ] :: Offline cpu [ PASS ] :: Command 'echo 0 > /sys/devices/system/cpu/cpu4/online' (Expected 0, got 0) [ LOG ] :: Select cpu-partitioning profile [ LOG ] :: Output of 'tuned-adm profile cpu-partitioning': [ LOG ] :: --------------- OUTPUT START --------------- [ LOG ] :: --------------- OUTPUT END --------------- [ PASS ] :: Command 'tuned-adm profile cpu-partitioning' (Expected 0, got 0) [ PASS ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 0) ==================================================== Reproduced in: python-perf-3.10.0-862.el7.x86_64 tuned-2.9.0-1.el7_5.2.noarch tuned-profiles-cpu-partitioning-2.9.0-1.el7_5.2.noarch FAIL ==================================================== [ PASS ] :: Command 'echo 'isolated_cores=2-3' > /etc/tuned/cpu-partitioning-variables.conf' (Expected 0, got 0) [ LOG ] :: Offline cpu [ PASS ] :: Command 'echo 0 > /sys/devices/system/cpu/cpu4/online' (Expected 0, got 0) [ LOG ] :: Select cpu-partitioning profile [ LOG ] :: Output of 'tuned-adm profile cpu-partitioning': [ LOG ] :: --------------- OUTPUT START --------------- [ LOG ] :: Traceback (most recent call last): [ LOG ] :: File "/usr/sbin/tuned-adm", line 94, in <module> [ LOG ] :: result = admin.action(action_name, **options) [ LOG ] :: File "/usr/lib/python2.7/site-packages/tuned/admin/admin.py", line 75, in action [ LOG ] :: res = self._controller.run() [ LOG ] :: File "/usr/lib/python2.7/site-packages/tuned/admin/dbus_controller.py", line 59, in run [ LOG ] :: self._main_loop.run() [ LOG ] :: File "/usr/lib64/python2.7/site-packages/gi/overrides/GLib.py", line 577, in run [ LOG ] :: raise KeyboardInterrupt [ LOG ] :: KeyboardInterrupt [ LOG ] :: --------------- OUTPUT END --------------- [ FAIL ] :: Command 'tuned-adm profile cpu-partitioning' (Expected 0, got 3) [ PASS ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 0) > We can see that only combination of new tuned and new python-perf leads
> to positive result of a test case. Can we deploy in 7.5.z also fix
> in python-perf package?
Thank you Tereza for testing this. Here's the way I see it, and Jirka, Jaroslav, Jeff, please do jump in with any additional thoughts.
1) Initially, tuned would hang when hyperthreads was disabled using the new runtime and kernel boottime flags. This happened on RHEL 7.4z and 7.5z. Jaroslav's patched tuned fixed those test cases for me. But given what we've learned since then, his patch may have just masked the perf python problem.
2) Tereza's more simple test case, of just disabling one cpu, uncovered the perf python problem.
Given we will now have RHEL 7.4z and 7.5z customers who use the new runtime and boottime flags to disable hyperthreads, if any of them are using a tuned profile that isolates cpus, they will need a fix. Just a new tuned "might" work for them, but it looks like the perf patch will be needed for completeness.
My vote would be to:
a) Submit the new patched tuned as soon as we can for an ancillary hot patch.
b) Submit the patched perf python to RHEL 7.4z and 7.5z to be included in the next z-stream update in September. If a customer needs it sooner, we'll deal with a hot patch if we have to.
Comments?
Joe
(In reply to Joe Mario from comment #53) > > We can see that only combination of new tuned and new python-perf leads > > to positive result of a test case. Can we deploy in 7.5.z also fix > > in python-perf package? > > Thank you Tereza for testing this. Here's the way I see it, and Jirka, > Jaroslav, Jeff, please do jump in with any additional thoughts. > > 1) Initially, tuned would hang when hyperthreads was disabled using the new > runtime and kernel boottime flags. This happened on RHEL 7.4z and 7.5z. > Jaroslav's patched tuned fixed those test cases for me. But given what > we've learned since then, his patch may have just masked the perf python > problem. > > 2) Tereza's more simple test case, of just disabling one cpu, uncovered the > perf python problem. > > Given we will now have RHEL 7.4z and 7.5z customers who use the new runtime > and boottime flags to disable hyperthreads, if any of them are using a tuned > profile that isolates cpus, they will need a fix. Just a new tuned "might" > work for them, but it looks like the perf patch will be needed for > completeness. > > My vote would be to: > a) Submit the new patched tuned as soon as we can for an ancillary hot > patch. > b) Submit the patched perf python to RHEL 7.4z and 7.5z to be included in > the next z-stream update in September. If a customer needs it sooner, we'll > deal with a hot patch if we have to. > > Comments? > Joe IMHO the original Tuned patch shouldn't mask the perf problem - the python perf problem was always there. The crash caused by python perf is just not easy to reproduce (without running through the valgrind or by "having luck"). Otherwise I agree. (In reply to Joe Mario from comment #53) SNIP > patch. > b) Submit the patched perf python to RHEL 7.4z and 7.5z to be included in > the next z-stream update in September. If a customer needs it sooner, we'll > deal with a hot patch if we have to. 7.5.z backport brewing in here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17928924 I posted the changes upstream, and I think we need to have them posted to Y stream first before they could be picked up by zstream. I'll do that once they are accepted upstream. Also I'll need some BZ for the Y stream post. jirka On my laptop I tried to disable one, two and more cpus, functionality of this bugzilla works well. Do you want some other testing from tuned QE? (In reply to Joe Mario from comment #53) > My vote would be to: > a) Submit the new patched tuned as soon as we can for an ancillary hot > patch. > b) Submit the patched perf python to RHEL 7.4z and 7.5z to be included in > the next z-stream update in September. If a customer needs it sooner, we'll > deal with a hot patch if we have to. I agree. We need to update both components, but they do not necessarily need to ship together. Let's ship the tuned update ASAP, and the kernel / python-perf update in the next z-stream batch. Hmm, unfortunately, this is a weak point in perf testing, for two reasons: 1) perf python module has almost zero test coverage 2) we don't test perf with disabled CPUs There were (and are) various problems with sparse NUMA nodes, etc. So why not with sparse CPUs. Both (1) and (2) have to be improved. Also, if I understand it correctly, there is a bugfix needed in tuned and another in perf. Is there a corresponding bug opened against kernel/perf? Hi Michael:
> Is there a corresponding bug opened against kernel/perf?
I will be creating one as soon as I get a chance. Likely tomorrow.
> perf python module has almost zero test coverage
It apparently gets some indirect test coverage whenever tuned is used to select a profile that isolates cpus, (including cpu-partitioning and various realtime profiles).
Joe
(In reply to Joe Mario from comment #61) > > Is there a corresponding bug opened against kernel/perf? > I will be creating one as soon as I get a chance. Likely tomorrow. See bug 1619465 I tested this bug with new package tuned-2.10.0-2.el7 and known traceback from comment #35 was appeared. Python-perf fix should be deployed also in rhel-7.6, I clonned bug from #62 to 7.6 release, see BZ#1620774. # rpm -q tuned tuned-2.10.0-2.el7.noarch # tuned-adm profile cpu-partitioning ^CTraceback (most recent call last): File "/usr/sbin/tuned-adm", line 94, in <module> result = admin.action(action_name, **options) File "/usr/lib/python2.7/site-packages/tuned/admin/admin.py", line 75, in action res = self._controller.run() File "/usr/lib/python2.7/site-packages/tuned/admin/dbus_controller.py", line 59, in run self._main_loop.run() File "/usr/lib64/python2.7/site-packages/gi/overrides/GLib.py", line 577, in run raise KeyboardInterrupt KeyboardInterrupt I verified this bugzilla with provided python-perf package [1] which contain fix from #46. This fix ix NECESSARY for right behavior when some cpu is offlined. Please, deploy python-perf in RHEL-7.6 or ASAP, because without this fix it does not work. [1] https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=18201528 I tried all options: tuned | python-perf | result ----------------------------------------------------------------------------- 2.10.0-4.el7 (new) | 3.10.0-944.el7perf_python (new) | PASS 2.10.0-4.el7 (new) | 3.10.0-693.el7 (old) | FAIL 2.9.0-1.el7 (old) | 3.10.0-944.el7perf_python (new) | FAIL 2.9.0-1.el7 (old) | 3.10.0-693.el7 (old) | FAIL We can see that only combination of new tuned and new python-perf leads to positive result of a test case. ==================================================== Verified in: tuned-2.10.0-4.el7.noarch tuned-profiles-cpu-partitioning-2.10.0-4.el7.noarch python-perf-3.10.0-944.el7perf_python.x86_64 PASS ==================================================== [ PASS ] :: Command 'echo 'isolated_cores=2-3' > /etc/tuned/cpu-partitioning-variables.conf' (Expected 0, got 0) [ LOG ] :: Offline cpu [ PASS ] :: Command 'echo 0 > /sys/devices/system/cpu/cpu4/online' (Expected 0, got 0) [ LOG ] :: Select cpu-partitioning profile [ LOG ] :: Output of 'tuned-adm profile cpu-partitioning': [ LOG ] :: --------------- OUTPUT START --------------- [ LOG ] :: --------------- OUTPUT END --------------- [ PASS ] :: Command 'tuned-adm profile cpu-partitioning' (Expected 0, got 0) [ PASS ] :: Command 'tuned-adm active | grep cpu-partitioning' (Expected 0, got 0) test case: /CoreOS/tuned/Regression/offlined-cpu-in-profile-cpu-partitioning Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3172 Clearing "needinfo" flag on this long-since-closed BZ. |