RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 855840 - kernel may soft-lockup while stopping some of the CPUs
Summary: kernel may soft-lockup while stopping some of the CPUs
Keywords:
Status: CLOSED DUPLICATE of bug 843541
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-10 11:23 UTC by Roman Kagan
Modified: 2012-09-10 12:41 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-10 12:41:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
serial console log (2.63 MB, application/octet-stream)
2012-09-10 11:23 UTC, Roman Kagan
no flags Details

Description Roman Kagan 2012-09-10 11:23:24 UTC
Created attachment 611402 [details]
serial console log

Description of problem:

During system shutdown, the kernel gets stuck after printing

ACPI: Preparing to enter system sleep state S5
Disabling non-boot CPUs ...

and then endlessly reports soft-lockup in one of the migration (aka cpu_stopper) threads in stop_machine_cpu_stop every minute or so.


Version-Release number of selected component (if applicable):
detected on 2.6.32-220.23.1.el6.x86_64; seems relevant to all RHEL6 series.


How reproducible:
under one percent


Steps to Reproduce:
1. reboot or shut down the system
  
Actual results:
system is stuck

Expected results:
system proceeds to reboot/halt


Additional info:
The problem was detected while rebooting in a loop several RHEL6.2 virtual machines in a test version of Parallels Cloud Server.

The issue has been tracked down to the situation where on one of the CPUs the realtime runqueue ran out of its quota while no tasks remained in the regular runqueue.  As a result, the cpu_stopper thread never got scheduled on the CPU because it was on the rt runqueue, and no regular task was available to run and unthrottle the rt runqueue.

The issue was addressed by the mainline linux commit

commit 34f971f6f7988be4d014eec3e3526bee6d007ffa
Author: Peter Zijlstra <a.p.zijlstra>
Date:   Wed Sep 22 13:53:15 2010 +0200

    sched: Create special class for stop/migrate work
    
    In order to separate the stop/migrate work thread from the SCHED_FIFO
    implementation, create a special class for it that is of higher priority than
    SCHED_FIFO itself.
    
    This currently solves a problem where cpu-hotplug consumes so much cpu-time
    that the SCHED_FIFO class gets throttled, but has the bandwidth replenishment
    timer pending on the now dead cpu.
    
    It is also required for when we add the planned deadline scheduling class above
    SCHED_FIFO, as the stop/migrate thread still needs to transcent those tasks.
    
    Tested-by: Heiko Carstens <heiko.carstens.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra>
    LKML-Reference: <1285165776.2275.1022.camel@laptop>
    Signed-off-by: Ingo Molnar <mingo>

which appeared in v2.6.37-rc1.

Comment 2 Igor Mammedov 2012-09-10 12:41:57 UTC
Fix is targeted for RHEL6.4

*** This bug has been marked as a duplicate of bug 843541 ***


Note You need to log in before you can comment on or make changes to this bug.