Bug 1608672 - RT system hang due to wrong of rq's nr_running
Summary: RT system hang due to wrong of rq's nr_running
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt   
(Show other bugs)
Version: 7.4
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Scott Wood
QA Contact: Jiri Kastner
Marie Dolezelova
URL:
Whiteboard:
Keywords: ZStream
Depends On:
Blocks: 1175461 1532680 1541534 1617941 1618466
TreeView+ depends on / blocked
 
Reported: 2018-07-26 06:36 UTC by tianyongjiang
Modified: 2018-12-10 21:25 UTC (History)
9 users (show)

Fixed In Version: kernel-rt-3.10.0-931.rt56.881.el7
Doc Type: Bug Fix
Doc Text:
A race condition that prevented tasks from being scheduled properly has been fixed Previously, preemption was enabled too early after a context switch. If a task was migrated to another CPU after a context switch, a mismatch between CPU and runqueue during load balancing sometimes occurred. Consequently, a runnable task on an idle CPU failed to run, and the operating system became unresponsive. This update disables preemption in the schedule_tail() function. As a result, CPU migration during post-schedule processing no longer occurs, which prevents the above mismatch. The operating system no longer hangs due to this bug.
Story Points: ---
Clone Of:
: 1617941 1618466 (view as bug list)
Environment:
Last Closed: 2018-10-30 09:43:34 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
output of crash tool (882.81 KB, application/x-bzip)
2018-07-26 06:36 UTC, tianyongjiang
no flags Details

Description tianyongjiang 2018-07-26 06:36:53 UTC
Created attachment 1470606 [details]
output of crash tool

Description of problem:
  I find my RT system hang. So I trigger panic by sysrq, and then analyze vmcore. 

  The RT process(PID 4, RU)has been waken up to hold the boot_tvec_bases lock ,but the nr_running of rq(cpu 0) is wrong(there has one RT process, but it is 0). so the pick_next_task() function always choose the idle process.  
There are also many process waiting for hold the boot_tvec_bases lock. 
  Additionally the nr_running of rq(cpu 24) is wrong(there has one RT process, but it is 2).

  crash>runq
  CPU 0 RUNQUEUE: ffff881ffca19080
    CURRENT: PID: 0      TASK: ffffffff81a02480  COMMAND: "swapper/0"
    RT PRIO_ARRAY: ffff881ffca19208
        [ 98] PID: 4      TASK: ffff88022bb030f0  COMMAND: "ktimersoftd/0"
    CFS RB_ROOT: ffff881ffca19120
        [no tasks queued]
  ...
  CPU 24 RUNQUEUE: ffff881ffcd19080
    CURRENT: PID: 450    TASK: ffff881ffe02d190  COMMAND: "irq/46-xhci_hcd"
    RT PRIO_ARRAY: ffff881ffcd19208
        [ 49] PID: 450    TASK: ffff881ffe02d190  COMMAND: "irq/46-xhci_hcd"
    CFS RB_ROOT: ffff881ffcd19120
        [no tasks queued]
  ...

  crash> ps
    PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
    ...
      4      2   0  ffff88022bb030f0  RU   0.0       0      0  [ktimersoftd/0]
    ...

  crash> task 4
    ...
    exec_start = 129364170626886,
    ...
    last_arrival = 129364170620770,
    ...

  crash> struct rq ffff881ffca19080
    ...
    nr_running = 0,
    ...
    idle_stamp = 269573437263498,
    ...
    rt = {
      ...
      rt_nr_running = 1,
      ...
    }
    ...

  crash> struct rq ffff881ffcd19080
    ...
    nr_running = 2,
    ...
    rt = {
      ...
      rt_nr_running = 1,
      ...
    }
    ...


Version-Release number of selected component (if applicable):
  kernel-rt-3.10.0-693.11.1.rt56.632.el7

How reproducible:
  No obvious regular pattern. 

Additional info:
  Please refer to attachments.

Comment 6 tianyongjiang 2018-08-09 08:44:29 UTC
Hi swood@redhat.com :
  The preempt_enable() in  schedule_tail() function is not effective,because only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my kernel-rt is x86_64.
  So Preemption is not enabled, I am confused.

  How did you fix? Can you describe in more detail,or paste the code?

asmlinkage void schedule_tail(struct task_struct *prev)
        __releases(rq->lock)
{
        struct rq *rq = this_rq();

        finish_task_switch(rq, prev);

        /*
         * FIXME: do we need to worry about rq being invalidated by the
         * task_switch?
         */
        post_schedule(rq);

#ifdef __ARCH_WANT_UNLOCKED_CTXSW
        /* In this case, finish_task_switch does not reenable preemption */
        preempt_enable();
#endif
        if (current->set_child_tid)
                put_user(task_pid_vnr(current), current->set_child_tid);
}


  In addition, the preemption is also enabled in latest kernel code: 
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v4.18-rc8

Comment 7 tianyongjiang 2018-08-09 10:49:59 UTC
Hi swood@redhat.com :
 If you need vmcore, I can provide it.

Comment 10 Scott Wood 2018-08-09 17:00:54 UTC
(In reply to tianyongjiang from comment #6)
> Hi swood@redhat.com :
>   The preempt_enable() in  schedule_tail() function is not effective,because
> only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my
> kernel-rt is x86_64.
>   So Preemption is not enabled, I am confused.
> 
>   How did you fix? Can you describe in more detail,or paste the code?

That is not where preemption is getting enabled.  finish_task_switch() calls finish_lock_switch() which releases rq->lock, and releasing a raw spinlock contains a preempt_enable().

The fix is upstream commit 1a43a14a5bd9c32 ("sched: Fix schedule_tail() to disable preemption").

>   In addition, the preemption is also enabled in latest kernel code: 
>  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
> kernel/sched/core.c?h=v4.18-rc8

Yes, but not until after balance_callback() -- and upstream has made the preempt count always be 2 on context switch, whereas in 3.10 a newly forked thread has a preempt count of 1.

Comment 15 Beth Uptagrafft 2018-09-17 16:04:31 UTC
*** Bug 1590222 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2018-10-30 09:43:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3096


Note You need to log in before you can comment on or make changes to this bug.