Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
A race condition that prevented tasks from being scheduled properly has been fixed
Previously, preemption was enabled too early after a context switch. If a task was migrated to another CPU after a context switch, a mismatch between CPU and runqueue during load balancing sometimes occurred. Consequently, a runnable task on an idle CPU failed to run, and the operating system became unresponsive. This update disables preemption in the schedule_tail() function. As a result, CPU migration during post-schedule processing no longer occurs, which prevents the above mismatch. The operating system no longer hangs due to this bug.
Created attachment 1470606[details]
output of crash tool
Description of problem:
I find my RT system hang. So I trigger panic by sysrq, and then analyze vmcore.
The RT process(PID 4, RU)has been waken up to hold the boot_tvec_bases lock ,but the nr_running of rq(cpu 0) is wrong(there has one RT process, but it is 0). so the pick_next_task() function always choose the idle process.
There are also many process waiting for hold the boot_tvec_bases lock.
Additionally the nr_running of rq(cpu 24) is wrong(there has one RT process, but it is 2).
crash>runq
CPU 0 RUNQUEUE: ffff881ffca19080
CURRENT: PID: 0 TASK: ffffffff81a02480 COMMAND: "swapper/0"
RT PRIO_ARRAY: ffff881ffca19208
[ 98] PID: 4 TASK: ffff88022bb030f0 COMMAND: "ktimersoftd/0"
CFS RB_ROOT: ffff881ffca19120
[no tasks queued]
...
CPU 24 RUNQUEUE: ffff881ffcd19080
CURRENT: PID: 450 TASK: ffff881ffe02d190 COMMAND: "irq/46-xhci_hcd"
RT PRIO_ARRAY: ffff881ffcd19208
[ 49] PID: 450 TASK: ffff881ffe02d190 COMMAND: "irq/46-xhci_hcd"
CFS RB_ROOT: ffff881ffcd19120
[no tasks queued]
...
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
...
4 2 0 ffff88022bb030f0 RU 0.0 0 0 [ktimersoftd/0]
...
crash> task 4
...
exec_start = 129364170626886,
...
last_arrival = 129364170620770,
...
crash> struct rq ffff881ffca19080
...
nr_running = 0,
...
idle_stamp = 269573437263498,
...
rt = {
...
rt_nr_running = 1,
...
}
...
crash> struct rq ffff881ffcd19080
...
nr_running = 2,
...
rt = {
...
rt_nr_running = 1,
...
}
...
Version-Release number of selected component (if applicable):
kernel-rt-3.10.0-693.11.1.rt56.632.el7
How reproducible:
No obvious regular pattern.
Additional info:
Please refer to attachments.
Hi swood :
The preempt_enable() in schedule_tail() function is not effective,because only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my kernel-rt is x86_64.
So Preemption is not enabled, I am confused.
How did you fix? Can you describe in more detail,or paste the code?
asmlinkage void schedule_tail(struct task_struct *prev)
__releases(rq->lock)
{
struct rq *rq = this_rq();
finish_task_switch(rq, prev);
/*
* FIXME: do we need to worry about rq being invalidated by the
* task_switch?
*/
post_schedule(rq);
#ifdef __ARCH_WANT_UNLOCKED_CTXSW
/* In this case, finish_task_switch does not reenable preemption */
preempt_enable();
#endif
if (current->set_child_tid)
put_user(task_pid_vnr(current), current->set_child_tid);
}
In addition, the preemption is also enabled in latest kernel code:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v4.18-rc8
(In reply to tianyongjiang from comment #6)
> Hi swood :
> The preempt_enable() in schedule_tail() function is not effective,because
> only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my
> kernel-rt is x86_64.
> So Preemption is not enabled, I am confused.
>
> How did you fix? Can you describe in more detail,or paste the code?
That is not where preemption is getting enabled. finish_task_switch() calls finish_lock_switch() which releases rq->lock, and releasing a raw spinlock contains a preempt_enable().
The fix is upstream commit 1a43a14a5bd9c32 ("sched: Fix schedule_tail() to disable preemption").
> In addition, the preemption is also enabled in latest kernel code:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
> kernel/sched/core.c?h=v4.18-rc8
Yes, but not until after balance_callback() -- and upstream has made the preempt count always be 2 on context switch, whereas in 3.10 a newly forked thread has a preempt count of 1.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2018:3096
Created attachment 1470606 [details] output of crash tool Description of problem: I find my RT system hang. So I trigger panic by sysrq, and then analyze vmcore. The RT process(PID 4, RU)has been waken up to hold the boot_tvec_bases lock ,but the nr_running of rq(cpu 0) is wrong(there has one RT process, but it is 0). so the pick_next_task() function always choose the idle process. There are also many process waiting for hold the boot_tvec_bases lock. Additionally the nr_running of rq(cpu 24) is wrong(there has one RT process, but it is 2). crash>runq CPU 0 RUNQUEUE: ffff881ffca19080 CURRENT: PID: 0 TASK: ffffffff81a02480 COMMAND: "swapper/0" RT PRIO_ARRAY: ffff881ffca19208 [ 98] PID: 4 TASK: ffff88022bb030f0 COMMAND: "ktimersoftd/0" CFS RB_ROOT: ffff881ffca19120 [no tasks queued] ... CPU 24 RUNQUEUE: ffff881ffcd19080 CURRENT: PID: 450 TASK: ffff881ffe02d190 COMMAND: "irq/46-xhci_hcd" RT PRIO_ARRAY: ffff881ffcd19208 [ 49] PID: 450 TASK: ffff881ffe02d190 COMMAND: "irq/46-xhci_hcd" CFS RB_ROOT: ffff881ffcd19120 [no tasks queued] ... crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM ... 4 2 0 ffff88022bb030f0 RU 0.0 0 0 [ktimersoftd/0] ... crash> task 4 ... exec_start = 129364170626886, ... last_arrival = 129364170620770, ... crash> struct rq ffff881ffca19080 ... nr_running = 0, ... idle_stamp = 269573437263498, ... rt = { ... rt_nr_running = 1, ... } ... crash> struct rq ffff881ffcd19080 ... nr_running = 2, ... rt = { ... rt_nr_running = 1, ... } ... Version-Release number of selected component (if applicable): kernel-rt-3.10.0-693.11.1.rt56.632.el7 How reproducible: No obvious regular pattern. Additional info: Please refer to attachments.