Bug 1608672
| Summary: | RT system hang due to wrong of rq's nr_running | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | tianyongjiang <tian.yongjiang> | ||||
| Component: | kernel-rt | Assignee: | Crystal Wood <crwood> | ||||
| kernel-rt sub component: | Memory Management | QA Contact: | Jiri Kastner <jkastner> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | Marie Hornickova <mdolezel> | ||||
| Severity: | urgent | ||||||
| Priority: | high | CC: | bhu, crwood, dhoward, jiang.biao2, jkastner, lgoncalv, mkolaja, pezhang, stalexan | ||||
| Version: | 7.4 | Keywords: | ZStream | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | kernel-rt-3.10.0-931.rt56.881.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: |
A race condition that prevented tasks from being scheduled properly has been fixed
Previously, preemption was enabled too early after a context switch. If a task was migrated to another CPU after a context switch, a mismatch between CPU and runqueue during load balancing sometimes occurred. Consequently, a runnable task on an idle CPU failed to run, and the operating system became unresponsive. This update disables preemption in the schedule_tail() function. As a result, CPU migration during post-schedule processing no longer occurs, which prevents the above mismatch. The operating system no longer hangs due to this bug.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1617941 1618466 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-10-30 09:43:34 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1175461, 1532680, 1541534, 1617941, 1618466 | ||||||
| Attachments: |
|
||||||
Hi swood :
The preempt_enable() in schedule_tail() function is not effective,because only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my kernel-rt is x86_64.
So Preemption is not enabled, I am confused.
How did you fix? Can you describe in more detail,or paste the code?
asmlinkage void schedule_tail(struct task_struct *prev)
__releases(rq->lock)
{
struct rq *rq = this_rq();
finish_task_switch(rq, prev);
/*
* FIXME: do we need to worry about rq being invalidated by the
* task_switch?
*/
post_schedule(rq);
#ifdef __ARCH_WANT_UNLOCKED_CTXSW
/* In this case, finish_task_switch does not reenable preemption */
preempt_enable();
#endif
if (current->set_child_tid)
put_user(task_pid_vnr(current), current->set_child_tid);
}
In addition, the preemption is also enabled in latest kernel code:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v4.18-rc8
Hi swood : If you need vmcore, I can provide it. (In reply to tianyongjiang from comment #6) > Hi swood : > The preempt_enable() in schedule_tail() function is not effective,because > only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my > kernel-rt is x86_64. > So Preemption is not enabled, I am confused. > > How did you fix? Can you describe in more detail,or paste the code? That is not where preemption is getting enabled. finish_task_switch() calls finish_lock_switch() which releases rq->lock, and releasing a raw spinlock contains a preempt_enable(). The fix is upstream commit 1a43a14a5bd9c32 ("sched: Fix schedule_tail() to disable preemption"). > In addition, the preemption is also enabled in latest kernel code: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ > kernel/sched/core.c?h=v4.18-rc8 Yes, but not until after balance_callback() -- and upstream has made the preempt count always be 2 on context switch, whereas in 3.10 a newly forked thread has a preempt count of 1. *** Bug 1590222 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3096 |
Created attachment 1470606 [details] output of crash tool Description of problem: I find my RT system hang. So I trigger panic by sysrq, and then analyze vmcore. The RT process(PID 4, RU)has been waken up to hold the boot_tvec_bases lock ,but the nr_running of rq(cpu 0) is wrong(there has one RT process, but it is 0). so the pick_next_task() function always choose the idle process. There are also many process waiting for hold the boot_tvec_bases lock. Additionally the nr_running of rq(cpu 24) is wrong(there has one RT process, but it is 2). crash>runq CPU 0 RUNQUEUE: ffff881ffca19080 CURRENT: PID: 0 TASK: ffffffff81a02480 COMMAND: "swapper/0" RT PRIO_ARRAY: ffff881ffca19208 [ 98] PID: 4 TASK: ffff88022bb030f0 COMMAND: "ktimersoftd/0" CFS RB_ROOT: ffff881ffca19120 [no tasks queued] ... CPU 24 RUNQUEUE: ffff881ffcd19080 CURRENT: PID: 450 TASK: ffff881ffe02d190 COMMAND: "irq/46-xhci_hcd" RT PRIO_ARRAY: ffff881ffcd19208 [ 49] PID: 450 TASK: ffff881ffe02d190 COMMAND: "irq/46-xhci_hcd" CFS RB_ROOT: ffff881ffcd19120 [no tasks queued] ... crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM ... 4 2 0 ffff88022bb030f0 RU 0.0 0 0 [ktimersoftd/0] ... crash> task 4 ... exec_start = 129364170626886, ... last_arrival = 129364170620770, ... crash> struct rq ffff881ffca19080 ... nr_running = 0, ... idle_stamp = 269573437263498, ... rt = { ... rt_nr_running = 1, ... } ... crash> struct rq ffff881ffcd19080 ... nr_running = 2, ... rt = { ... rt_nr_running = 1, ... } ... Version-Release number of selected component (if applicable): kernel-rt-3.10.0-693.11.1.rt56.632.el7 How reproducible: No obvious regular pattern. Additional info: Please refer to attachments.