Bug 151920
| Summary: | 8GB SMP servers appear to hang in VM subsystem under stress | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | David Knierim <new_galoot> |
| Component: | kernel | Assignee: | Larry Woodman <lwoodman> |
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.0 | CC: | james.oden, jbaron, jburke, peterm, petrides, poelstra, riel, tao, tburke |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i686 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHSA-2005-663 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2005-09-28 14:51:38 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 156320 | ||
| Attachments: | |||
|
Description
David Knierim
2005-03-23 15:56:51 UTC
Created attachment 112268 [details]
sysrq output from i7520 box with 8GB DRAM, 2GB swap
Created attachment 112269 [details]
sysrq from i7520 server with 8GB DRAM and 16GB swap
Note, beyond these 2 sysrq logs we have 19 other logs running various RHEL 3
derivative kernels. I have examined all these logs, and have found the
following similarities:
- Of the 4 processors, in Show CPU's, one processor is always running
__free_pages_ok() for kswapd, and another processor is running
__free_pages_ok()for do_group_exit().
- The memtst process is always in do_group_exit().
memtst is a C program that allocates memory and does
various reads and writes accross this memory with various
bit patterns.
- All processors have called .text.lock.swap.
- kswapd is always calling launder_page() in its call chain (actually the
call chain always looks the same except for try_to_free_buffers() is
sometimes in between launder_page() and __free_pages_ok().
Also, though the box from a human perspective appears hung (you will not be
able to log in at this moment, and no new output seems to come from user land
processes), interupts are still serviced (as can be seen in some of the sysreq
logs) and printk's from the kernel still seem to make their way to the console.
Cheers...james
Created attachment 112273 [details]
another sysrq output from i7520 servver with 8GB DRAM, 2GB swap
Could you please get sysrq-W output (like sysrq-P, but from all CPUs) of a hung server ? It would be useful to know what the other CPUs are doing, if this is a locking deadlock. Btw, the reasons I suspect this is a locking deadlock: 1) there is lots of memory free, so it's not a low memory deadlock 2) the currently running task (on one CPU) is trying to grab a lock 3) there are many runnable processes that have schedule_timeout() as the top function in the stack - which means they all got woken up at some point in time, but never actually got to run Oh n/m - sysrq-w is attached near the bottom of the tracebacks. Furthermore, it appears that all 4 CPUs are spinning on the same spinlock, but I don't see any non-running task holding the lock. The good news is that it should be relatively simple to reproduce in-house and take a crashdump...
I see the problem here, launder_page has an unlikely race in which it calls
page_cache_release() with the zone->lru_lock held. If the other CPUs release
the last page references to this page it can call __free_pages_ok() which trys
to take the zone->lru_lock. This patch fixes the problem:
-----------------------------------------------------------------------------
--- linux-2.4.21/mm/vmscan.c.orig
+++ linux-2.4.21/mm/vmscan.c
@@ -315,7 +315,9 @@ int launder_page(zone_t * zone, int gfp_
if (cache_ratio(zone) > cache_limits.max && page_anon(page) &&
free_min(zone) < 0) {
add_page_to_active_list(page, INITIAL_AGE);
+ lru_unlock(zone);
page_cache_release(page);
+ lru_lock(zone);
return 0;
}
------------------------------------------------------------------------------
Larry
Sweet! I have this running on two servers with 8GB running with this patch, so far they have run for 16 hours and 14.5 hours respectively. The box that has run for 14.5 hours normally failed in less than 2, so I feel pretty confident that this takes care of the problem. Thank you! I'll run these servers for a while longer to make sure the problem is indeed solved. David I have similar problem with HP DL580 -- 4x 3.0Ghz Xeon, 8GB RAM, 2x MSA30-DB
external scsi storage. The system crawls under heavy I/O load.
[root@drtsut10 mm]# uname -a
Linux drtsut10.corp.acxiom.net 2.4.21-20.ELsmp #1 SMP Wed Aug 18 20:46:40 EDT
2004 i686 i686 i386 GNU/Linux
[root@drtsut10 mm]# cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 8165167104 8142745600 22421504 0 448823296 7425323008
Swap: 4186226688 4096 4186222592
MemTotal: 7973796 kB
MemFree: 21896 kB
MemShared: 0 kB
Buffers: 438304 kB
Cached: 7251288 kB
SwapCached: 4 kB
Active: 645964 kB
ActiveAnon: 20980 kB
ActiveCache: 624984 kB
Inact_dirty: 5622108 kB
Inact_laundry: 936376 kB
Inact_clean: 507424 kB
Inact_target: 1542372 kB
HighTotal: 7208924 kB
HighFree: 1020 kB
LowTotal: 764872 kB
LowFree: 20876 kB
SwapTotal: 4088112 kB
SwapFree: 4088108 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
See attachments for SysRQ-T, SysRQ-M, SysRQ-W, and process listing (ps)
Created attachment 112962 [details]
SysRq Show-CPUs.txt
Created attachment 112963 [details]
SysRq Show-Memory.txt
Created attachment 112964 [details]
SysRq Show-State.txt
Created attachment 112965 [details]
Process listing (varios outputs of ps)
The patch above does not affect my case:
/*
* The page is in active use or really unfreeable. Move to
* the active list and adjust the page age if needed.
*/
pte_chain_lock(page);
if (page_referenced(page, &over_rsslimit) && !over_rsslimit &&
page_mapping_inuse(page)) {
del_page_from_inactive_laundry_list(page);
add_page_to_active_list(page, INITIAL_AGE);
pte_chain_unlock(page);
UnlockPage(page);
lru_unlock(zone);
page_cache_release(page);
lru_lock(zone);
return 0;
}
The code is different. It seems that I already have the lru_lock/lru_unlock
statements.
A fix for this problem has just been committed to the RHEL3 U6 patch pool this evening (in kernel version 2.4.21-32.2.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-663.html *** Bug 170144 has been marked as a duplicate of this bug. *** |