Bug 1441792

Summary: tuned: backtrace when switching profiles
Product: Red Hat Enterprise Linux 7 Reporter: Luiz Capitulino <lcapitulino>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Tereza Cerna <tcerna>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: jeder, jskarvad, lcapitulino, tcerna
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tuned-2.8.0-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 12:35:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1394932    

Description Luiz Capitulino 2017-04-12 18:06:09 UTC
Description of problem:

When switching back/from the cpu-partitioning profile I sometimes get a hang. When this happens, the following follows:

1. The profile doesn't seem to be fully applied

2. systemctl status tuned reports the following:

   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: self._execute_all_non_device_commands(instance)
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: File "/usr/lib/python2.7/site-packages/tuned/plugins/base.py", line 406, in _execute_all_non_device_commands
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: self._execute_non_device_command(instance, command, new_value)
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: File "/usr/lib/python2.7/site-packages/tuned/plugins/base.py", line 483, in _execute_non_device_command
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: command["custom"](True, new_value, False, False)
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: File "/usr/lib/python2.7/site-packages/tuned/plugins/plugin_scheduler.py", line 392, in _isolated_cores
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: self._set_ps_affinity(affinity, True)
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: File "/usr/lib/python2.7/site-packages/tuned/plugins/plugin_scheduler.py", line 352, in _set_ps_affinity
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: psl = filter(lambda v: re.search(self._ps_whitelist, v["stat"]["comm"]) is not None, ps.values())
   Apr 11 13:42:19 perf118.perf.lab.eng.bos.redhat.com tuned[3784]: AttributeError: pidstats instance has no attribute 'values'
   [dpdk-host] --- Tue Apr 11 @ 01:43:10 PM --- root@perf118 --- /usr/lib/python2.7/site-packages/tuned/plugins
   #

I could not reproduce this issue after reverting the following commit:

commit ac78f90c773cc97573844f521c2f67291f15d354
Author: Jaroslav Škarvada <jskarvad>
Date:   Fri Apr 7 17:46:32 2017 +0200

    scheduler: added support for cores isolation


Version-Release number of selected component (if applicable): tuned-2.8.0-1.el7	


How reproducible:


Steps to Reproduce:
1. Just switch back and forth the cpu-partitioning profile. I was doing this on a machine with 23 CPUs, this might have something to do with the problem

Comment 2 Jaroslav Škarvada 2017-04-12 18:14:45 UTC
Could you check with the latest git head, e.g. https://github.com/redhat-performance/tuned/commit/191785a05421fbe1f7fe6d48b4907aceac5b57a1

I could do scratch build for you.

It turned out to be weird behaviour of python-linux-procfs, which I was able to workaround. I am going to open upstream bugzilla for this library. It's an race condition and it's hardly reproducible on machines in our pool.

Comment 3 Luiz Capitulino 2017-04-12 19:43:54 UTC
I think the solution for this BZ is the plan you outlined ealier: revert the changes from cpu-partitioning but keep commit ac78f90c773.

Comment 13 Luiz Capitulino 2017-04-18 14:37:06 UTC
I've implemented the test-case below. It triggers the issue in seconds with tuned-2.8.0-1.el7. I'm unable to trig it using tuned-2.8.0-2.el7 when running the test-case for several minutes. So, I confirm this is fixed.

1. Download a kernel from kernel.org
2. Unpack it
3. Run:

$ make allyesconfig && make -j NR-CPUS (where NR-CPUS is twice the number of CPUs in your system)

Then in a separate terminal:

# while true; do
  tuned-adm profile balanced
  sleep 5s
  tuned-adm profile cpu-partitioning
  sleep 5s
done

Then in another terminal:

# watch -n1 systemctl status tuned

And what for backtraces in the log or tuned-adm hanging.

Comment 14 Tereza Cerna 2017-04-20 12:13:08 UTC
@Luiz: Thank you for your testing.

Switching to VERIFIED (SanityOnly) status based on successful test by lcapitulino.

Comment 15 errata-xmlrpc 2017-08-01 12:35:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2102