| Summary: | NMI Watchdog detected LOCKUP on CPU 14 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Liang Zheng <lzheng> |
| Component: | kernel | Assignee: | Larry Woodman <lwoodman> |
| Status: | CLOSED WONTFIX | QA Contact: | Zhang Kexin <kzhang> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 5.7 | CC: | aquini, ccui, kzhang, lzheng |
| Target Milestone: | rc | Flags: | pm-rhel:
needinfo?
(lzheng) |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-06-02 13:21:15 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Liang Zheng
2011-07-04 04:25:28 UTC
There is a similar bug in RHEL4 https://bugzilla.redhat.com/show_bug.cgi?id=460935 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Sorry about the delay here. Can anyone reproduce this and get a crash dump so I can see what all the CPUs are doing and who has the spinlock that is being taken with spinlock_irq(). Thanks, Larry Sorry by this late update as well, but I believe this issue is due to a known cache_alloc_refill() infinite loop condition that usually happens due to a slab corruption that stroke elsewhere in code execution. There's no issue around cache_alloc_refill() function bits and the real offender is hidden among all other slab users, unfortunately.
As a matter of fact, upstream just had the following excerpt included to catch that exceptional condition and break the loop (crashing the box) when it strikes:
----
commit 714b8171af9c930a59a0da8f6fe50518e70ab035
Author: Pekka Enberg <penberg.fi>
Date: Sun May 6 14:49:03 2007 -0700
slab: ensure cache_alloc_refill terminates
If slab->inuse is corrupted, cache_alloc_refill can enter an infinite
loop as detailed by Michael Richardson in the following post:
<http://lkml.org/lkml/2007/2/16/292>. This adds a BUG_ON to catch
those cases.
Cc: Michael Richardson <mcr>
Acked-by: Christoph Lameter <clameter>
Signed-off-by: Pekka Enberg <penberg.fi>
Signed-off-by: Andrew Morton <akpm>
Signed-off-by: Linus Torvalds <torvalds>
diff --git a/mm/slab.c b/mm/slab.c
index 8b71a9c..21b2aef 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2990,6 +2990,14 @@ retry:
slabp = list_entry(entry, struct slab, list);
check_slabp(cachep, slabp);
check_spinlock_acquired(cachep);
+
+ /*
+ * The slab was either on partial or free list so
+ * there must be at least one object available for
+ * allocation.
+ */
+ BUG_ON(slabp->inuse < 0 || slabp->inuse >= cachep->num);
+
while (slabp->inuse < cachep->num && batchcount--) {
STATS_INC_ALLOCED(cachep);
STATS_INC_ACTIVE(cachep);
----
If this sort of condition (hung/crash) is being observed quite often on the system, we might want to grab a vmcore while running the -debug kernel, as there's VM / SLAB instrumentation to help us on identifying who is causing the slab corruption which leads to this undesirable cache_alloc_refill() infinite loop.
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in the last planned RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX. To request that Red Hat re-consider this request, please re-open the bugzilla via appropriate support channels and provide additional business and/or technical details about its importance to you. Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support). |