Bug 1608672
Summary: | RT system hang due to wrong of rq's nr_running | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | tianyongjiang <tian.yongjiang> | ||||
Component: | kernel-rt | Assignee: | Crystal Wood <crwood> | ||||
kernel-rt sub component: | Memory Management | QA Contact: | Jiri Kastner <jkastner> | ||||
Status: | CLOSED ERRATA | Docs Contact: | Marie Hornickova <mdolezel> | ||||
Severity: | urgent | ||||||
Priority: | high | CC: | bhu, crwood, dhoward, jiang.biao2, jkastner, lgoncalv, mkolaja, pezhang, stalexan | ||||
Version: | 7.4 | Keywords: | ZStream | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-rt-3.10.0-931.rt56.881.el7 | Doc Type: | Bug Fix | ||||
Doc Text: |
A race condition that prevented tasks from being scheduled properly has been fixed
Previously, preemption was enabled too early after a context switch. If a task was migrated to another CPU after a context switch, a mismatch between CPU and runqueue during load balancing sometimes occurred. Consequently, a runnable task on an idle CPU failed to run, and the operating system became unresponsive. This update disables preemption in the schedule_tail() function. As a result, CPU migration during post-schedule processing no longer occurs, which prevents the above mismatch. The operating system no longer hangs due to this bug.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1617941 1618466 (view as bug list) | Environment: | |||||
Last Closed: | 2018-10-30 09:43:34 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1175461, 1532680, 1541534, 1617941, 1618466 | ||||||
Attachments: |
|
Hi swood : The preempt_enable() in schedule_tail() function is not effective,because only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my kernel-rt is x86_64. So Preemption is not enabled, I am confused. How did you fix? Can you describe in more detail,or paste the code? asmlinkage void schedule_tail(struct task_struct *prev) __releases(rq->lock) { struct rq *rq = this_rq(); finish_task_switch(rq, prev); /* * FIXME: do we need to worry about rq being invalidated by the * task_switch? */ post_schedule(rq); #ifdef __ARCH_WANT_UNLOCKED_CTXSW /* In this case, finish_task_switch does not reenable preemption */ preempt_enable(); #endif if (current->set_child_tid) put_user(task_pid_vnr(current), current->set_child_tid); } In addition, the preemption is also enabled in latest kernel code: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v4.18-rc8 Hi swood : If you need vmcore, I can provide it. (In reply to tianyongjiang from comment #6) > Hi swood : > The preempt_enable() in schedule_tail() function is not effective,because > only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my > kernel-rt is x86_64. > So Preemption is not enabled, I am confused. > > How did you fix? Can you describe in more detail,or paste the code? That is not where preemption is getting enabled. finish_task_switch() calls finish_lock_switch() which releases rq->lock, and releasing a raw spinlock contains a preempt_enable(). The fix is upstream commit 1a43a14a5bd9c32 ("sched: Fix schedule_tail() to disable preemption"). > In addition, the preemption is also enabled in latest kernel code: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ > kernel/sched/core.c?h=v4.18-rc8 Yes, but not until after balance_callback() -- and upstream has made the preempt count always be 2 on context switch, whereas in 3.10 a newly forked thread has a preempt count of 1. *** Bug 1590222 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3096 |
Created attachment 1470606 [details] output of crash tool Description of problem: I find my RT system hang. So I trigger panic by sysrq, and then analyze vmcore. The RT process(PID 4, RU)has been waken up to hold the boot_tvec_bases lock ,but the nr_running of rq(cpu 0) is wrong(there has one RT process, but it is 0). so the pick_next_task() function always choose the idle process. There are also many process waiting for hold the boot_tvec_bases lock. Additionally the nr_running of rq(cpu 24) is wrong(there has one RT process, but it is 2). crash>runq CPU 0 RUNQUEUE: ffff881ffca19080 CURRENT: PID: 0 TASK: ffffffff81a02480 COMMAND: "swapper/0" RT PRIO_ARRAY: ffff881ffca19208 [ 98] PID: 4 TASK: ffff88022bb030f0 COMMAND: "ktimersoftd/0" CFS RB_ROOT: ffff881ffca19120 [no tasks queued] ... CPU 24 RUNQUEUE: ffff881ffcd19080 CURRENT: PID: 450 TASK: ffff881ffe02d190 COMMAND: "irq/46-xhci_hcd" RT PRIO_ARRAY: ffff881ffcd19208 [ 49] PID: 450 TASK: ffff881ffe02d190 COMMAND: "irq/46-xhci_hcd" CFS RB_ROOT: ffff881ffcd19120 [no tasks queued] ... crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM ... 4 2 0 ffff88022bb030f0 RU 0.0 0 0 [ktimersoftd/0] ... crash> task 4 ... exec_start = 129364170626886, ... last_arrival = 129364170620770, ... crash> struct rq ffff881ffca19080 ... nr_running = 0, ... idle_stamp = 269573437263498, ... rt = { ... rt_nr_running = 1, ... } ... crash> struct rq ffff881ffcd19080 ... nr_running = 2, ... rt = { ... rt_nr_running = 1, ... } ... Version-Release number of selected component (if applicable): kernel-rt-3.10.0-693.11.1.rt56.632.el7 How reproducible: No obvious regular pattern. Additional info: Please refer to attachments.