Bug 1608672

Summary:

RT system hang due to wrong of rq's nr_running

Product:

Red Hat Enterprise Linux 7

Reporter:

tianyongjiang <tian.yongjiang>

Component:

kernel-rt

Assignee:

Crystal Wood <crwood>

kernel-rt sub component:

Memory Management

QA Contact:

Jiri Kastner <jkastner>

Status:

CLOSED ERRATA

Docs Contact:

Marie Hornickova <mdolezel>

Severity:

urgent

Priority:

high

CC:

bhu, crwood, dhoward, jiang.biao2, jkastner, lgoncalv, mkolaja, pezhang, stalexan

Version:

7.4

Keywords:

ZStream

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-rt-3.10.0-931.rt56.881.el7

Doc Type:

Bug Fix

Doc Text:

A race condition that prevented tasks from being scheduled properly has been fixed Previously, preemption was enabled too early after a context switch. If a task was migrated to another CPU after a context switch, a mismatch between CPU and runqueue during load balancing sometimes occurred. Consequently, a runnable task on an idle CPU failed to run, and the operating system became unresponsive. This update disables preemption in the schedule_tail() function. As a result, CPU migration during post-schedule processing no longer occurs, which prevents the above mismatch. The operating system no longer hangs due to this bug.

Story Points:

---

Clone Of:

Clones:

1617941 1618466 (view as bug list)

Environment:

Last Closed:

2018-10-30 09:43:34 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1175461, 1532680, 1541534, 1617941, 1618466

Attachments:

Description	Flags
output of crash tool	none

Description tianyongjiang 2018-07-26 06:36:53 UTC

Created attachment 1470606 [details]
output of crash tool

Description of problem:
  I find my RT system hang. So I trigger panic by sysrq, and then analyze vmcore. 

  The RT process（PID 4, RU）has been waken up to hold the boot_tvec_bases lock ，but the nr_running of rq(cpu 0) is wrong(there has one RT process, but it is 0). so the pick_next_task() function always choose the idle process.  
There are also many process waiting for hold the boot_tvec_bases lock. 
  Additionally the nr_running of rq(cpu 24) is wrong(there has one RT process, but it is 2).

  crash>runq
  CPU 0 RUNQUEUE: ffff881ffca19080
    CURRENT: PID: 0      TASK: ffffffff81a02480  COMMAND: "swapper/0"
    RT PRIO_ARRAY: ffff881ffca19208
        [ 98] PID: 4      TASK: ffff88022bb030f0  COMMAND: "ktimersoftd/0"
    CFS RB_ROOT: ffff881ffca19120
        [no tasks queued]
  ...
  CPU 24 RUNQUEUE: ffff881ffcd19080
    CURRENT: PID: 450    TASK: ffff881ffe02d190  COMMAND: "irq/46-xhci_hcd"
    RT PRIO_ARRAY: ffff881ffcd19208
        [ 49] PID: 450    TASK: ffff881ffe02d190  COMMAND: "irq/46-xhci_hcd"
    CFS RB_ROOT: ffff881ffcd19120
        [no tasks queued]
  ...

  crash> ps
    PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
    ...
      4      2   0  ffff88022bb030f0  RU   0.0       0      0  [ktimersoftd/0]
    ...

  crash> task 4
    ...
    exec_start = 129364170626886,
    ...
    last_arrival = 129364170620770,
    ...

  crash> struct rq ffff881ffca19080
    ...
    nr_running = 0,
    ...
    idle_stamp = 269573437263498,
    ...
    rt = {
      ...
      rt_nr_running = 1,
      ...
    }
    ...

  crash> struct rq ffff881ffcd19080
    ...
    nr_running = 2,
    ...
    rt = {
      ...
      rt_nr_running = 1,
      ...
    }
    ...


Version-Release number of selected component (if applicable):
  kernel-rt-3.10.0-693.11.1.rt56.632.el7

How reproducible:
  No obvious regular pattern. 

Additional info:
  Please refer to attachments.

Comment 6 tianyongjiang 2018-08-09 08:44:29 UTC

Hi swood :
  The preempt_enable() in  schedule_tail() function is not effective，because only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my kernel-rt is x86_64.
  So Preemption is not enabled, I am confused.

  How did you fix? Can you describe in more detail,or paste the code?

asmlinkage void schedule_tail(struct task_struct *prev)
        __releases(rq->lock)
{
        struct rq *rq = this_rq();

        finish_task_switch(rq, prev);

        /*
         * FIXME: do we need to worry about rq being invalidated by the
         * task_switch?
         */
        post_schedule(rq);

#ifdef __ARCH_WANT_UNLOCKED_CTXSW
        /* In this case, finish_task_switch does not reenable preemption */
        preempt_enable();
#endif
        if (current->set_child_tid)
                put_user(task_pid_vnr(current), current->set_child_tid);
}


  In addition, the preemption is also enabled in latest kernel code: 
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v4.18-rc8

Comment 7 tianyongjiang 2018-08-09 10:49:59 UTC

Hi swood :
 If you need vmcore, I can provide it.

Comment 10 Crystal Wood 2018-08-09 17:00:54 UTC

(In reply to tianyongjiang from comment #6)
> Hi swood :
>   The preempt_enable() in  schedule_tail() function is not effective，because
> only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my
> kernel-rt is x86_64.
>   So Preemption is not enabled, I am confused.
> 
>   How did you fix? Can you describe in more detail,or paste the code?

That is not where preemption is getting enabled.  finish_task_switch() calls finish_lock_switch() which releases rq->lock, and releasing a raw spinlock contains a preempt_enable().

The fix is upstream commit 1a43a14a5bd9c32 ("sched: Fix schedule_tail() to disable preemption").

>   In addition, the preemption is also enabled in latest kernel code: 
>  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
> kernel/sched/core.c?h=v4.18-rc8

Yes, but not until after balance_callback() -- and upstream has made the preempt count always be 2 on context switch, whereas in 3.10 a newly forked thread has a preempt count of 1.

Comment 15 Beth Uptagrafft 2018-09-17 16:04:31 UTC

*** Bug 1590222 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2018-10-30 09:43:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3096