Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1331562

Summary: rt: fix idle_balance iterating over all CPUs if a runnable task shows up partway through
Product: Red Hat Enterprise Linux 7 Reporter: Clark Williams <williams>
Component: kernel-rtAssignee: Steven Rostedt <srostedt>
kernel-rt sub component: Process management QA Contact: Jiri Kastner <jkastner>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: bhu, bperkins, lgoncalv, riel, srostedt, williams
Version: 7.3   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 19:48:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1274397    
Attachments:
Description Flags
break out of idle_balance if an RT task is ready to run
none
Enable irqs in idle_balance() routine
none
Move call to idle_balance to post-schedule none

Description Clark Williams 2016-04-28 19:40:23 UTC
The idle_balance() kernel function is responsible for balancing SCHED_OTHER tasks when a core goes idle. This function is called with interrupts disabled, meaning on systems with a large number of cores (>=64) hundreds of microseconds can be spent in the balance function without the opportunity for a high priority task to preempt and run. 

This behavior can be seen when running rteval on a 72-core HP DL580gen9 and observing the lock contention on the run-queue locks when idle_balance() is called. The cyclictest thread on the core that is running idle_balance() cannot run; it's timer fires but the interrupt is held off due to idle_balance() running and the cyclictest thread misses its deadline by hundreds of microseconds.

Comment 1 Rik van Riel 2016-04-28 19:46:57 UTC
There are a few separate issues here:
1) idle_balance is currently called with irqs disabled, Steven Rostedt has a patch to fix that
2) idle_balance continues to iterate over all CPUs even if a runnable task shows up during balancing, I have a patch to fix that

We need both of these fixes together to get the system to behave better.

Comment 2 Clark Williams 2016-04-28 19:49:13 UTC
Created attachment 1152052 [details]
break out of idle_balance if an RT task is ready to run

Comment 3 Clark Williams 2016-04-28 19:49:46 UTC
Created attachment 1152053 [details]
Enable irqs in idle_balance() routine

Comment 4 Clark Williams 2016-04-28 19:50:21 UTC
Created attachment 1152054 [details]
Move call to idle_balance to post-schedule

Comment 5 Clark Williams 2016-04-28 19:51:10 UTC
The above three patches have been applied to a scratch build based on kernel-rt-3.10.0-382.rt56.263.el7 and are under testing now

Comment 7 Jiri Kastner 2016-10-04 10:09:31 UTC
see bug 1209987 comment 20

Comment 9 errata-xmlrpc 2016-11-03 19:48:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2584.html