RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1608672 - RT system hang due to wrong of rq's nr_running
Summary: RT system hang due to wrong of rq's nr_running
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.4
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Crystal Wood
QA Contact: Jiri Kastner
Marie Hornickova
URL:
Whiteboard:
Depends On:
Blocks: 1175461 1532680 1541534 1617941 1618466
TreeView+ depends on / blocked
 
Reported: 2018-07-26 06:36 UTC by tianyongjiang
Modified: 2022-03-13 15:18 UTC (History)
9 users (show)

Fixed In Version: kernel-rt-3.10.0-931.rt56.881.el7
Doc Type: Bug Fix
Doc Text:
A race condition that prevented tasks from being scheduled properly has been fixed Previously, preemption was enabled too early after a context switch. If a task was migrated to another CPU after a context switch, a mismatch between CPU and runqueue during load balancing sometimes occurred. Consequently, a runnable task on an idle CPU failed to run, and the operating system became unresponsive. This update disables preemption in the schedule_tail() function. As a result, CPU migration during post-schedule processing no longer occurs, which prevents the above mismatch. The operating system no longer hangs due to this bug.
Clone Of:
: 1617941 1618466 (view as bug list)
Environment:
Last Closed: 2018-10-30 09:43:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
output of crash tool (882.81 KB, application/x-bzip)
2018-07-26 06:36 UTC, tianyongjiang
no flags Details

Description tianyongjiang 2018-07-26 06:36:53 UTC
Created attachment 1470606 [details]
output of crash tool

Description of problem:
  I find my RT system hang. So I trigger panic by sysrq, and then analyze vmcore. 

  The RT process(PID 4, RU)has been waken up to hold the boot_tvec_bases lock ,but the nr_running of rq(cpu 0) is wrong(there has one RT process, but it is 0). so the pick_next_task() function always choose the idle process.  
There are also many process waiting for hold the boot_tvec_bases lock. 
  Additionally the nr_running of rq(cpu 24) is wrong(there has one RT process, but it is 2).

  crash>runq
  CPU 0 RUNQUEUE: ffff881ffca19080
    CURRENT: PID: 0      TASK: ffffffff81a02480  COMMAND: "swapper/0"
    RT PRIO_ARRAY: ffff881ffca19208
        [ 98] PID: 4      TASK: ffff88022bb030f0  COMMAND: "ktimersoftd/0"
    CFS RB_ROOT: ffff881ffca19120
        [no tasks queued]
  ...
  CPU 24 RUNQUEUE: ffff881ffcd19080
    CURRENT: PID: 450    TASK: ffff881ffe02d190  COMMAND: "irq/46-xhci_hcd"
    RT PRIO_ARRAY: ffff881ffcd19208
        [ 49] PID: 450    TASK: ffff881ffe02d190  COMMAND: "irq/46-xhci_hcd"
    CFS RB_ROOT: ffff881ffcd19120
        [no tasks queued]
  ...

  crash> ps
    PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
    ...
      4      2   0  ffff88022bb030f0  RU   0.0       0      0  [ktimersoftd/0]
    ...

  crash> task 4
    ...
    exec_start = 129364170626886,
    ...
    last_arrival = 129364170620770,
    ...

  crash> struct rq ffff881ffca19080
    ...
    nr_running = 0,
    ...
    idle_stamp = 269573437263498,
    ...
    rt = {
      ...
      rt_nr_running = 1,
      ...
    }
    ...

  crash> struct rq ffff881ffcd19080
    ...
    nr_running = 2,
    ...
    rt = {
      ...
      rt_nr_running = 1,
      ...
    }
    ...


Version-Release number of selected component (if applicable):
  kernel-rt-3.10.0-693.11.1.rt56.632.el7

How reproducible:
  No obvious regular pattern. 

Additional info:
  Please refer to attachments.

Comment 6 tianyongjiang 2018-08-09 08:44:29 UTC
Hi swood :
  The preempt_enable() in  schedule_tail() function is not effective,because only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my kernel-rt is x86_64.
  So Preemption is not enabled, I am confused.

  How did you fix? Can you describe in more detail,or paste the code?

asmlinkage void schedule_tail(struct task_struct *prev)
        __releases(rq->lock)
{
        struct rq *rq = this_rq();

        finish_task_switch(rq, prev);

        /*
         * FIXME: do we need to worry about rq being invalidated by the
         * task_switch?
         */
        post_schedule(rq);

#ifdef __ARCH_WANT_UNLOCKED_CTXSW
        /* In this case, finish_task_switch does not reenable preemption */
        preempt_enable();
#endif
        if (current->set_child_tid)
                put_user(task_pid_vnr(current), current->set_child_tid);
}


  In addition, the preemption is also enabled in latest kernel code: 
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v4.18-rc8

Comment 7 tianyongjiang 2018-08-09 10:49:59 UTC
Hi swood :
 If you need vmcore, I can provide it.

Comment 10 Crystal Wood 2018-08-09 17:00:54 UTC
(In reply to tianyongjiang from comment #6)
> Hi swood :
>   The preempt_enable() in  schedule_tail() function is not effective,because
> only ia64 and mips have __ARCH_WANT_UNLOCKED_CTXSW(I seach kernel code), my
> kernel-rt is x86_64.
>   So Preemption is not enabled, I am confused.
> 
>   How did you fix? Can you describe in more detail,or paste the code?

That is not where preemption is getting enabled.  finish_task_switch() calls finish_lock_switch() which releases rq->lock, and releasing a raw spinlock contains a preempt_enable().

The fix is upstream commit 1a43a14a5bd9c32 ("sched: Fix schedule_tail() to disable preemption").

>   In addition, the preemption is also enabled in latest kernel code: 
>  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
> kernel/sched/core.c?h=v4.18-rc8

Yes, but not until after balance_callback() -- and upstream has made the preempt count always be 2 on context switch, whereas in 3.10 a newly forked thread has a preempt count of 1.

Comment 15 Beth Uptagrafft 2018-09-17 16:04:31 UTC
*** Bug 1590222 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2018-10-30 09:43:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3096


Note You need to log in before you can comment on or make changes to this bug.