Bug 202205
Summary: | system freeze under limpack stress | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Benedikt Schaefer <bschaefer> | ||||||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4.2 | CC: | efocht, jbaron | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2012-06-20 13:17:01 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Benedikt Schaefer
2006-08-11 15:31:51 UTC
Created attachment 134028 [details]
Call trace from crached machine
Created attachment 134135 [details]
an other erro log from a nother machine with some syntomps
Here an other erro log from a different machine (same type) which also crashed.
Looks like the patch submitted in the LKML thread "[PATCH] dm: Fix deadlock under high i/o load in raid1 setup." is addressing exactly this issue. http://opensubscriber.com/message/linux-kernel%40vger.kernel.org/4640513.html Any chance this goes into the RHEL4 kernel? regards, Erich Did you get a crash dump form this machine? The reason I ask is because it looks like the machine took an NMI watchdog timeout panic because this CPU was stuck in a spinlock with interrupts disabled. Evidently someone else has the zone->lock so this CPU starved without taking timer interrupts long enough to incur the NMIwatchdog crash. static struct page * buffered_rmqueue(struct zone *zone, int order, int gfp_flags) { ... if (page == NULL) { spin_lock_irqsave(&zone->lock, flags); page = __rmqueue(zone, order); spin_unlock_irqrestore(&zone->lock, flags); } No we have no crash dump for this machine I have look at the patch posted by Erich, but I'm not sure this will help us because we are not using dm-raid we are using mdadm. Could this effect also happend with mdadm or is it a bug only from the dm-raid package? Created attachment 134221 [details]
raid1_mempool_race.patch
In theory this patch should solve the issue in drivers/md/raid1.c similarly to
what was posted to LKML. My attempt to reproduce the bug lead straight into
another lockup (ext3 related). Will check bugzilla for something similar and
eventually post the report in another ticket...
I t looks like an other bugreport exist with the same problem (Bugreport #149088) Erich, did you verify that the patch in comment #7 fixed this problem? The NMI watchdog panic attached in comment #2 is certainly a different problem but this patch might be the fix for memory allocation failure attached in comment #1 and that might very well cause the system to hang. Larry Woodman Hi Larry, I'm trying to produce the first (kswapd related) freeze but didn't succeed, yet. It's a pretty rare event. Trying still with the original kernel. Actually this should occur faster on single core machines (IMHO), so we switched testing to single core nodes. Once the reproducer works, I'll try with the patch. And keep you updated, of course. Regards, Erich Erich or Benedikt, can you try increasing /proc/sys/vm/min_free_kbytes to 4 times its default value and see if this prevents this hang from happening? This is what was done in the upstream kernel and does prevent the system from totally exhausting RAM. Thanks, Larry Woodman Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue. |