Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1332593 - rt: Use IPI to trigger RT task push migration instead of pulling
rt: Use IPI to trigger RT task push migration instead of pulling
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt (Show other bugs)
7.3
x86_64 Linux
high Severity medium
: rc
: ---
Assigned To: Clark Williams
Jiri Kastner
: ZStream
Depends On:
Blocks: 1274397 1334459
  Show dependency treegraph
 
Reported: 2016-05-03 10:29 EDT by Clark Williams
Modified: 2016-11-03 15:49 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In order to avoid thundering herd access to run-queue locks when performing task migration, a three-commit series was backported from upstream. By sending an IPI from an underutilized core to an overloaded core, requesting that the overloaded core push the task to the requesting core, the issue was avoided.
Story Points: ---
Clone Of:
: 1334459 (view as bug list)
Environment:
Last Closed: 2016-11-03 15:49:35 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sched/rt: Use IPI to trigger RT task push migration (11.74 KB, patch)
2016-05-04 09:25 EDT, Clark Williams
no flags Details | Diff
sched/rt: Hide the push_irq_work_func() declaration (2.09 KB, patch)
2016-05-04 09:26 EDT, Clark Williams
no flags Details | Diff
sched/rt: Have the schedule IPI irq_work run in hard irq (1.68 KB, patch)
2016-05-04 09:27 EDT, Clark Williams
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2584 normal SHIPPED_LIVE Important: kernel-rt security, bug fix, and enhancement update 2016-11-03 08:08:49 EDT

  None (edit)
Description Clark Williams 2016-05-03 10:29:26 EDT
Avoid thundering herd access to run-queue locks when performing task migration by sending an IPI from an underutilized core to an overloaded core, requesting that the overloaded core push the task to the requesting core. 

Backported from upstream commit b6366f048e0caff28af5335b7af2031266e1b06b
Comment 1 Clark Williams 2016-05-03 14:07:34 EDT
To reproduce: run kernel on a system with >= 64 cores (usually a four-socket DL58x) and after tuning BIOS for low latency run rteval for 1-2 hours. If latency spikes appear of 200-300 microseconds, then run with tracing:

In one login session run 'rteval --onlyload --duration=2h'

In a separate window:
1. trace-cmd start -e all  -p function -l '*rt_spin*' -l '_raw_spin*'
2. cyclictest --numa -p95 -i100 -d0 -qmu -b 200 --tracemark --notrace

wait for cyclictest to hit the breaktrace threshold and then run:
3. trace-cmd extract
4. trace-cmd report -l 

Looking through the report you will see stretches where the system goes idle and tries to migrate RT workloads to idle cores, with lots of calls to spin_lock/spin_unlock of the run-queue (rq) locks.
Comment 2 Clark Williams 2016-05-04 09:25 EDT
Created attachment 1153839 [details]
sched/rt: Use IPI to trigger RT task push migration

Rather than have all idle cpus try to migrate tasks from an overloaded cpu, have the idle cpu send an IPI to the overloaded cpu and push tasks to the requesting idle cpu.
Comment 3 Clark Williams 2016-05-04 09:26 EDT
Created attachment 1153841 [details]
sched/rt: Hide the push_irq_work_func() declaration

Get rid of a compiler warning from the previous IPI patch
Comment 4 Clark Williams 2016-05-04 09:27 EDT
Created attachment 1153842 [details]
sched/rt: Have the schedule IPI irq_work run in hard irq

As the sched rt pull work has moved to using irq_work IPI, having it
delayed to threading pretty much defeats the purpose. The handle also
expects interrupts to be disabled when called as it takes the rq locks.

Set the rt push ipi irq_work handle flag HARD_IRQ
Comment 11 errata-xmlrpc 2016-11-03 15:49:35 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2584.html

Note You need to log in before you can comment on or make changes to this bug.