Bug 1332593

Summary: rt: Use IPI to trigger RT task push migration instead of pulling
Product: Red Hat Enterprise Linux 7 Reporter: Clark Williams <williams>
Component: kernel-rtAssignee: Clark Williams <williams>
kernel-rt sub component: Process management QA Contact: Jiri Kastner <jkastner>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: bhu, lgoncalv, srostedt, williams
Version: 7.3Keywords: ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
In order to avoid thundering herd access to run-queue locks when performing task migration, a three-commit series was backported from upstream. By sending an IPI from an underutilized core to an overloaded core, requesting that the overloaded core push the task to the requesting core, the issue was avoided.
Story Points: ---
Clone Of:
: 1334459 (view as bug list) Environment:
Last Closed: 2016-11-03 19:49:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1274397, 1334459    
Attachments:
Description Flags
sched/rt: Use IPI to trigger RT task push migration
none
sched/rt: Hide the push_irq_work_func() declaration
none
sched/rt: Have the schedule IPI irq_work run in hard irq none

Description Clark Williams 2016-05-03 14:29:26 UTC
Avoid thundering herd access to run-queue locks when performing task migration by sending an IPI from an underutilized core to an overloaded core, requesting that the overloaded core push the task to the requesting core. 

Backported from upstream commit b6366f048e0caff28af5335b7af2031266e1b06b

Comment 1 Clark Williams 2016-05-03 18:07:34 UTC
To reproduce: run kernel on a system with >= 64 cores (usually a four-socket DL58x) and after tuning BIOS for low latency run rteval for 1-2 hours. If latency spikes appear of 200-300 microseconds, then run with tracing:

In one login session run 'rteval --onlyload --duration=2h'

In a separate window:
1. trace-cmd start -e all  -p function -l '*rt_spin*' -l '_raw_spin*'
2. cyclictest --numa -p95 -i100 -d0 -qmu -b 200 --tracemark --notrace

wait for cyclictest to hit the breaktrace threshold and then run:
3. trace-cmd extract
4. trace-cmd report -l 

Looking through the report you will see stretches where the system goes idle and tries to migrate RT workloads to idle cores, with lots of calls to spin_lock/spin_unlock of the run-queue (rq) locks.

Comment 2 Clark Williams 2016-05-04 13:25:55 UTC
Created attachment 1153839 [details]
sched/rt: Use IPI to trigger RT task push migration

Rather than have all idle cpus try to migrate tasks from an overloaded cpu, have the idle cpu send an IPI to the overloaded cpu and push tasks to the requesting idle cpu.

Comment 3 Clark Williams 2016-05-04 13:26:54 UTC
Created attachment 1153841 [details]
sched/rt: Hide the push_irq_work_func() declaration

Get rid of a compiler warning from the previous IPI patch

Comment 4 Clark Williams 2016-05-04 13:27:53 UTC
Created attachment 1153842 [details]
sched/rt: Have the schedule IPI irq_work run in hard irq

As the sched rt pull work has moved to using irq_work IPI, having it
delayed to threading pretty much defeats the purpose. The handle also
expects interrupts to be disabled when called as it takes the rq locks.

Set the rt push ipi irq_work handle flag HARD_IRQ

Comment 11 errata-xmlrpc 2016-11-03 19:49:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2584.html