Bug 1247184

Summary: scheduler plugin causes nohz_full to be de-activated
Product: Red Hat Enterprise Linux 7 Reporter: Luiz Capitulino <lcapitulino>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Tereza Cerna <tcerna>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.2CC: acme, jeder, jskarvad, psklenar, tcerna
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tuned-2.5.1-3.el7 Doc Type: Bug Fix
Doc Text:
Cause: Previously the new perf code for runtime tuning functionality of plugin_scheduler could cause nohz_full not to work correctly. Consequence: This may degrade performance for real-time workloads or HPC workloads. Fix: The perf code was improved to work correctly with nohz_full. Result: Now the runtime functionality of scheduler_plugin works correctly with nohz_full.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 12:21:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1240765    

Description Luiz Capitulino 2015-07-27 14:03:10 UTC
Description of problem:

During KVM-RT testing, we've found that tuned's scheduler plugin causes nohz_full to be de-activated. This happens because the scheduler plugin sets up perf event watching on all cores during startup and nohz_full is currently incompatible with perf event watching (see reproducer below).

Version-Release number of selected component (if applicable): tuned-2.4.1-1.20150705git1da9f3cc.el7.noarch


How reproducible:


Steps to Reproduce:
1. Set up nohz_full on a core. Eg., use the following options to the kernel command-line:

isolcpus=X nohz_full=X

2. Install tuned and activate a profile that uses the scheduler plugin. I'm using the realtime-virtual-host profile

3. Run the following script, where Y is the CPU mask corresponding to the nohz_full core and "stress" is any application capable of taking 100% of the CPU

"""
/bin/bash

cd /sys/kernel/debug/tracing
echo Y > tracing_cpumask
echo tick_stop > set_event
echo nop > current_tracer
echo > set_ftrace_filter
echo > /tracing/trace
echo 1 > /tracing/tracing_on

cd
chrt -f 1 taskset -c 15 ./stress --cpu 1 &
sleep 2

grep tick_stop /tracing/trace
pkill -9 stress
"""

Actual results:

Lots of:

tick_stop: success=no msg=perf events running


Expected results:

Only one of the following:

stress-3236  [015] d...1..   292.142704: tick_stop: success=yes msg=


Additional info:

Comment 1 Luiz Capitulino 2015-07-27 14:07:26 UTC
I forgot to mention that it's possible to disable the scheduler plugin runtime feature as a workaround for this issue. Simply edit the tuned.conf file for the profile and add runtime=0 to the scheduler section, like:

[scheduler]
runtime=0

Comment 2 Jaroslav Škarvada 2015-07-31 14:32:06 UTC
Upstream commit fixing this problem:
https://git.fedorahosted.org/cgit/tuned.git/commit/?id=5024f00ef59de394c31668696eb2f311f2aebc85

Comment 4 Fedora Update System 2015-09-03 18:49:50 UTC
tuned-2.5.1-2.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.

Comment 5 Fedora Update System 2015-09-03 18:50:46 UTC
tuned-2.5.1-2.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 7 errata-xmlrpc 2015-11-19 12:21:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2375.html