Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1608672

Summary:

RT system hang due to wrong of rq's nr_running

Product:

Red Hat Enterprise Linux 7

Reporter:

tianyongjiang <tian.yongjiang>

Component:

kernel-rt

Assignee:

Crystal Wood <crwood>

kernel-rt sub component:

Memory Management

QA Contact:

Jiri Kastner <jkastner>

Status:

CLOSED ERRATA

Docs Contact:

Marie Hornickova <mdolezel>

Severity:

urgent

Priority:

high

CC:

bhu, crwood, dhoward, jiang.biao2, jkastner, lgoncalv, mkolaja, pezhang, stalexan

Version:

7.4

Keywords:

ZStream

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-rt-3.10.0-931.rt56.881.el7

Doc Type:

Bug Fix

Doc Text:

A race condition that prevented tasks from being scheduled properly has been fixed Previously, preemption was enabled too early after a context switch. If a task was migrated to another CPU after a context switch, a mismatch between CPU and runqueue during load balancing sometimes occurred. Consequently, a runnable task on an idle CPU failed to run, and the operating system became unresponsive. This update disables preemption in the schedule_tail() function. As a result, CPU migration during post-schedule processing no longer occurs, which prevents the above mismatch. The operating system no longer hangs due to this bug.

Story Points:

---

Clone Of:

Clones:

1617941 1618466 (view as bug list)

Environment:

Last Closed:

2018-10-30 09:43:34 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1175461, 1532680, 1541534, 1617941, 1618466

Attachments:

Description	Flags
output of crash tool	none

Description tianyongjiang 2018-07-26 06:36:53 UTC

Created attachment 1470606 [details]
output of crash tool

Description of problem:
  I find my RT system hang. So I trigger panic by sysrq, and then analyze vmcore. 

  The RT process（PID 4, RU）has been waken up to hold the boot_tvec_bases lock ，but the nr_running of rq(cpu 0) is wrong(there has one RT process, but it is 0). so the pick_next_task() function always choose the idle process.  
There are also many process waiting for hold the boot_tvec_bases lock. 
  Additionally the nr_running of rq(cpu 24) is wrong(there has one RT process, but it is 2).

  crash>runq
  CPU 0 RUNQUEUE: ffff881ffca19080
    CURRENT: PID: 0      TASK: ffffffff81a02480  COMMAND: "swapper/0"
    RT PRIO_ARRAY: ffff881ffca19208
        [ 98] PID: 4      TASK: ffff88022bb030f0  COMMAND: "ktimersoftd/0"
    CFS RB_ROOT: ffff881ffca19120
        [no tasks queued]
  ...
  CPU 24 RUNQUEUE: ffff881ffcd19080
    CURRENT: PID: 450    TASK: ffff881ffe02d190  COMMAND: "irq/46-xhci_hcd"
    RT PRIO_ARRAY: ffff881ffcd19208
        [ 49] PID: 450    TASK: ffff881ffe02d190  COMMAND: "irq/46-xhci_hcd"
    CFS RB_ROOT: ffff881ffcd19120
        [no tasks queued]
  ...

  crash> ps
    PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
    ...
      4      2   0  ffff88022bb030f0  RU   0.0       0      0  [ktimersoftd/0]
    ...

  crash> task 4
    ...
    exec_start = 129364170626886,
    ...
    last_arrival = 129364170620770,
    ...

  crash> struct rq ffff881ffca19080
    ...
    nr_running = 0,
    ...
    idle_stamp = 269573437263498,
    ...
    rt = {
      ...
      rt_nr_running = 1,
      ...
    }
    ...

  crash> struct rq ffff881ffcd19080
    ...
    nr_running = 2,
    ...
    rt = {
      ...
      rt_nr_running = 1,
      ...
    }
    ...


Version-Release number of selected component (if applicable):
  kernel-rt-3.10.0-693.11.1.rt56.632.el7

How reproducible:
  No obvious regular pattern. 

Additional info:
  Please refer to attachments.

Comment 6 tianyongjiang 2018-08-09 08:44:29 UTC

Hi swood :
  The preempt_enable() in  schedule_tail() function is not effective，because only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my kernel-rt is x86_64.
  So Preemption is not enabled, I am confused.

  How did you fix? Can you describe in more detail,or paste the code?

asmlinkage void schedule_tail(struct task_struct *prev)
        __releases(rq->lock)
{
        struct rq *rq = this_rq();

        finish_task_switch(rq, prev);

        /*
         * FIXME: do we need to worry about rq being invalidated by the
         * task_switch?
         */
        post_schedule(rq);

#ifdef __ARCH_WANT_UNLOCKED_CTXSW
        /* In this case, finish_task_switch does not reenable preemption */
        preempt_enable();
#endif
        if (current->set_child_tid)
                put_user(task_pid_vnr(current), current->set_child_tid);
}


  In addition, the preemption is also enabled in latest kernel code: 
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v4.18-rc8

Comment 7 tianyongjiang 2018-08-09 10:49:59 UTC

Hi swood :
 If you need vmcore, I can provide it.

Comment 10 Crystal Wood 2018-08-09 17:00:54 UTC

(In reply to tianyongjiang from comment #6)
> Hi swood :
>   The preempt_enable() in  schedule_tail() function is not effective，because
> only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my
> kernel-rt is x86_64.
>   So Preemption is not enabled, I am confused.
> 
>   How did you fix? Can you describe in more detail,or paste the code?

That is not where preemption is getting enabled.  finish_task_switch() calls finish_lock_switch() which releases rq->lock, and releasing a raw spinlock contains a preempt_enable().

The fix is upstream commit 1a43a14a5bd9c32 ("sched: Fix schedule_tail() to disable preemption").

>   In addition, the preemption is also enabled in latest kernel code: 
>  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
> kernel/sched/core.c?h=v4.18-rc8

Yes, but not until after balance_callback() -- and upstream has made the preempt count always be 2 on context switch, whereas in 3.10 a newly forked thread has a preempt count of 1.

Comment 15 Beth Uptagrafft 2018-09-17 16:04:31 UTC

*** Bug 1590222 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2018-10-30 09:43:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3096