Bug 855840
Summary: | kernel may soft-lockup while stopping some of the CPUs | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Roman Kagan <rvkagan> | ||||
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.2 | CC: | imammedo | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-09-10 12:41:57 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Fix is targeted for RHEL6.4 *** This bug has been marked as a duplicate of bug 843541 *** |
Created attachment 611402 [details] serial console log Description of problem: During system shutdown, the kernel gets stuck after printing ACPI: Preparing to enter system sleep state S5 Disabling non-boot CPUs ... and then endlessly reports soft-lockup in one of the migration (aka cpu_stopper) threads in stop_machine_cpu_stop every minute or so. Version-Release number of selected component (if applicable): detected on 2.6.32-220.23.1.el6.x86_64; seems relevant to all RHEL6 series. How reproducible: under one percent Steps to Reproduce: 1. reboot or shut down the system Actual results: system is stuck Expected results: system proceeds to reboot/halt Additional info: The problem was detected while rebooting in a loop several RHEL6.2 virtual machines in a test version of Parallels Cloud Server. The issue has been tracked down to the situation where on one of the CPUs the realtime runqueue ran out of its quota while no tasks remained in the regular runqueue. As a result, the cpu_stopper thread never got scheduled on the CPU because it was on the rt runqueue, and no regular task was available to run and unthrottle the rt runqueue. The issue was addressed by the mainline linux commit commit 34f971f6f7988be4d014eec3e3526bee6d007ffa Author: Peter Zijlstra <a.p.zijlstra> Date: Wed Sep 22 13:53:15 2010 +0200 sched: Create special class for stop/migrate work In order to separate the stop/migrate work thread from the SCHED_FIFO implementation, create a special class for it that is of higher priority than SCHED_FIFO itself. This currently solves a problem where cpu-hotplug consumes so much cpu-time that the SCHED_FIFO class gets throttled, but has the bandwidth replenishment timer pending on the now dead cpu. It is also required for when we add the planned deadline scheduling class above SCHED_FIFO, as the stop/migrate thread still needs to transcent those tasks. Tested-by: Heiko Carstens <heiko.carstens.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra> LKML-Reference: <1285165776.2275.1022.camel@laptop> Signed-off-by: Ingo Molnar <mingo> which appeared in v2.6.37-rc1.