Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1441792 - tuned: backtrace when switching profiles
tuned: backtrace when switching profiles
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: tuned (Show other bugs)
7.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Jaroslav Škarvada
Tereza Cerna
:
Depends On:
Blocks: 1394932
  Show dependency treegraph
 
Reported: 2017-04-12 14:06 EDT by Luiz Capitulino
Modified: 2017-08-01 08:35 EDT (History)
4 users (show)

See Also:
Fixed In Version: tuned-2.8.0-2.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 08:35:21 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2102 normal SHIPPED_LIVE tuned bug fix and enhancement update 2017-08-01 12:07:33 EDT

  None (edit)
Description Luiz Capitulino 2017-04-12 14:06:09 EDT
Description of problem:

When switching back/from the cpu-partitioning profile I sometimes get a hang. When this happens, the following follows:

1. The profile doesn't seem to be fully applied

2. systemctl status tuned reports the following:

   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: self._execute_all_non_device_commands(instance)
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: File "/usr/lib/python2.7/site-packages/tuned/plugins/base.py", line 406, in _execute_all_non_device_commands
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: self._execute_non_device_command(instance, command, new_value)
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: File "/usr/lib/python2.7/site-packages/tuned/plugins/base.py", line 483, in _execute_non_device_command
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: command["custom"](True, new_value, False, False)
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: File "/usr/lib/python2.7/site-packages/tuned/plugins/plugin_scheduler.py", line 392, in _isolated_cores
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: self._set_ps_affinity(affinity, True)
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: File "/usr/lib/python2.7/site-packages/tuned/plugins/plugin_scheduler.py", line 352, in _set_ps_affinity
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: psl = filter(lambda v: re.search(self._ps_whitelist, v["stat"]["comm"]) is not None, ps.values())
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: AttributeError: pidstats instance has no attribute 'values'
   [dpdk-host] --- Tue Apr 11 @ 01:43:10 PM --- root@perf118 --- /usr/lib/python2.7/site-packages/tuned/plugins
   #

I could not reproduce this issue after reverting the following commit:

commit ac78f90c773cc97573844f521c2f67291f15d354
Author: Jaroslav Škarvada <jskarvad@redhat.com>
Date:   Fri Apr 7 17:46:32 2017 +0200

    scheduler: added support for cores isolation


Version-Release number of selected component (if applicable): tuned-2.8.0-1.el7	


How reproducible:


Steps to Reproduce:
1. Just switch back and forth the cpu-partitioning profile. I was doing this on a machine with 23 CPUs, this might have something to do with the problem
Comment 2 Jaroslav Škarvada 2017-04-12 14:14:45 EDT
Could you check with the latest git head, e.g. https://github.com/redhat-performance/tuned/commit/191785a05421fbe1f7fe6d48b4907aceac5b57a1

I could do scratch build for you.

It turned out to be weird behaviour of python-linux-procfs, which I was able to workaround. I am going to open upstream bugzilla for this library. It's an race condition and it's hardly reproducible on machines in our pool.
Comment 3 Luiz Capitulino 2017-04-12 15:43:54 EDT
I think the solution for this BZ is the plan you outlined ealier: revert the changes from cpu-partitioning but keep commit ac78f90c773.
Comment 13 Luiz Capitulino 2017-04-18 10:37:06 EDT
I've implemented the test-case below. It triggers the issue in seconds with tuned-2.8.0-1.el7. I'm unable to trig it using tuned-2.8.0-2.el7 when running the test-case for several minutes. So, I confirm this is fixed.

1. Download a kernel from kernel.org
2. Unpack it
3. Run:

$ make allyesconfig && make -j NR-CPUS (where NR-CPUS is twice the number of CPUs in your system)

Then in a separate terminal:

# while true; do
  tuned-adm profile balanced
  sleep 5s
  tuned-adm profile cpu-partitioning
  sleep 5s
done

Then in another terminal:

# watch -n1 systemctl status tuned

And what for backtraces in the log or tuned-adm hanging.
Comment 14 Tereza Cerna 2017-04-20 08:13:08 EDT
@Luiz: Thank you for your testing.

Switching to VERIFIED (SanityOnly) status based on successful test by lcapitulino.
Comment 15 errata-xmlrpc 2017-08-01 08:35:21 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2102

Note You need to log in before you can comment on or make changes to this bug.