Bug 151920 - 8GB SMP servers appear to hang in VM subsystem under stress
Summary: 8GB SMP servers appear to hang in VM subsystem under stress
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
: 170144 (view as bug list)
Depends On:
Blocks: 156320
TreeView+ depends on / blocked
Reported: 2005-03-23 15:56 UTC by David Knierim
Modified: 2007-11-30 22:07 UTC (History)
9 users (show)

Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2005-09-28 14:51:38 UTC

Attachments (Terms of Use)
sysrq output from i7520 box with 8GB DRAM, 2GB swap (65.24 KB, text/plain)
2005-03-23 16:02 UTC, David Knierim
no flags Details
sysrq from i7520 server with 8GB DRAM and 16GB swap (75.04 KB, text/plain)
2005-03-23 16:04 UTC, David Knierim
no flags Details
another sysrq output from i7520 servver with 8GB DRAM, 2GB swap (70.00 KB, text/plain)
2005-03-23 18:26 UTC, David Knierim
no flags Details
SysRq Show-CPUs.txt (68.03 KB, text/plain)
2005-04-11 17:42 UTC, Georgi Hristov
no flags Details
SysRq Show-Memory.txt (43.20 KB, text/plain)
2005-04-11 17:43 UTC, Georgi Hristov
no flags Details
SysRq Show-State.txt (80.82 KB, text/plain)
2005-04-11 17:44 UTC, Georgi Hristov
no flags Details
Process listing (varios outputs of ps) (17.41 KB, text/plain)
2005-04-11 17:45 UTC, Georgi Hristov
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:663 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 6 2005-09-28 04:00:00 UTC

Description David Knierim 2005-03-23 15:56:51 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
When running a customized version of ctcs (http://sourceforge.net/projects/va-ctcs/), we have been able to reproduce this problem reliably.  The problem has been observed on both hardware platforms we have tested with: Intel i7501 with 2 2.4Ghz Xeon processors and Intel i7520 with 2 3.2Ghz Xeon processors.  The problem only occurs on these boxes if they have 8GB of DRAM.   The same boxes with 6GB or less ram do not demonstrate the problem.

It appears that having more ethernet interfaces makes the problem more prone to happening.   The interfaces do not have to be cabled or configured; the cards just need to be in the box.

I have tried running with the hugemem kernel and the problem is much less likely to occur, but I have still seen one instance of a failure using the hugemem kernel.   I had to compile the hugemem kernel for the i7520 boxes because they  have 3ware controllers in them and the hugemem kernel does not include drivers for this card. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Configure box with 8GB of DRAM
2. Run our custom "burnin" profile of ctcs
3. wait for failure

Actual Results:  Eventually, it appears the box is hung.  However, magic-sysrq is operational.

Expected Results:  The tests should run without error.

Additional info:

Comment 1 David Knierim 2005-03-23 16:02:08 UTC
Created attachment 112268 [details]
sysrq output from i7520 box with 8GB DRAM, 2GB swap

Comment 2 David Knierim 2005-03-23 16:04:13 UTC
Created attachment 112269 [details]
sysrq from i7520 server with 8GB DRAM and 16GB swap

Comment 3 James Olin Oden 2005-03-23 16:53:11 UTC
Note, beyond these 2 sysrq logs we have 19 other logs running various RHEL 3 
derivative kernels.  I have examined all these logs, and have found the 
following similarities:

  - Of the 4 processors, in Show CPU's, one processor is always running 
    __free_pages_ok() for kswapd, and another processor is running 
    __free_pages_ok()for do_group_exit().
  - The memtst process is always in do_group_exit().  
    memtst is a C program that allocates memory and does
    various reads and writes accross this memory with various
    bit patterns.
  - All processors have called .text.lock.swap.
  - kswapd is always calling launder_page() in its call chain (actually the 
    call chain always looks the same except for try_to_free_buffers() is 
    sometimes in between launder_page() and __free_pages_ok().

Also, though the box from a human perspective appears hung (you will not be 
able to log in at this moment, and no new output seems to come from user land 
processes), interupts are still serviced (as can be seen in some of the sysreq 
logs) and printk's from the kernel still seem to make their way to the console.


Comment 4 David Knierim 2005-03-23 18:26:20 UTC
Created attachment 112273 [details]
another sysrq output from i7520 servver with 8GB DRAM, 2GB swap

Comment 5 Rik van Riel 2005-03-25 03:05:43 UTC
Could you please get sysrq-W output (like sysrq-P, but from all CPUs) of a hung
server ?

It would be useful to know what the other CPUs are doing, if this is a locking

Comment 6 Rik van Riel 2005-03-25 03:08:38 UTC
Btw, the reasons I suspect this is a locking deadlock:
1) there is lots of memory free, so it's not a low memory deadlock
2) the currently running task (on one CPU) is trying to grab a lock
3) there are many runnable processes that have schedule_timeout() as the top
function in the stack - which means they all got woken up at some point in time,
but never actually got to run

Comment 7 Rik van Riel 2005-03-25 03:22:12 UTC
Oh n/m - sysrq-w is attached near the bottom of the tracebacks.

Furthermore, it appears that all 4 CPUs are spinning on the same spinlock, but I
don't see any non-running task holding the lock.

The good news is that it should be relatively simple to reproduce in-house and
take a crashdump...

Comment 8 Larry Woodman 2005-04-06 19:26:41 UTC
I see the problem here, launder_page has an unlikely race in which it calls
page_cache_release() with the zone->lru_lock held.  If the other CPUs release
the last page references to this page it can call __free_pages_ok() which trys
to take the zone->lru_lock.  This patch fixes the problem:

--- linux-2.4.21/mm/vmscan.c.orig
+++ linux-2.4.21/mm/vmscan.c
@@ -315,7 +315,9 @@ int launder_page(zone_t * zone, int gfp_
        if (cache_ratio(zone) > cache_limits.max && page_anon(page) &&
                        free_min(zone) < 0) {
                add_page_to_active_list(page, INITIAL_AGE);
+               lru_unlock(zone);
+               lru_lock(zone);
                return 0;


Comment 9 David Knierim 2005-04-07 13:22:31 UTC
Sweet!  I have this running on two servers with 8GB running with this patch, so
far they have run for 16 hours and 14.5 hours respectively.  The box that has
run for 14.5 hours normally failed in less than 2, so I feel pretty confident
that this takes care of the problem.   Thank you!

I'll run these servers for a while longer to make sure the problem is indeed solved.


Comment 10 Georgi Hristov 2005-04-11 17:38:28 UTC
I have similar problem with HP DL580 -- 4x 3.0Ghz Xeon, 8GB RAM, 2x MSA30-DB
external scsi storage. The system crawls under heavy I/O load. 

[root@drtsut10 mm]# uname -a
Linux drtsut10.corp.acxiom.net 2.4.21-20.ELsmp #1 SMP Wed Aug 18 20:46:40 EDT
2004 i686 i686 i386 GNU/Linux

[root@drtsut10 mm]# cat /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  8165167104 8142745600 22421504        0 448823296 7425323008
Swap: 4186226688     4096 4186222592
MemTotal:      7973796 kB
MemFree:         21896 kB
MemShared:           0 kB
Buffers:        438304 kB
Cached:        7251288 kB
SwapCached:          4 kB
Active:         645964 kB
ActiveAnon:      20980 kB
ActiveCache:    624984 kB
Inact_dirty:   5622108 kB
Inact_laundry:  936376 kB
Inact_clean:    507424 kB
Inact_target:  1542372 kB
HighTotal:     7208924 kB
HighFree:         1020 kB
LowTotal:       764872 kB
LowFree:         20876 kB
SwapTotal:     4088112 kB
SwapFree:      4088108 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

See attachments for SysRQ-T, SysRQ-M, SysRQ-W, and process listing (ps)

Comment 11 Georgi Hristov 2005-04-11 17:42:28 UTC
Created attachment 112962 [details]
SysRq Show-CPUs.txt

Comment 12 Georgi Hristov 2005-04-11 17:43:45 UTC
Created attachment 112963 [details]
SysRq Show-Memory.txt

Comment 13 Georgi Hristov 2005-04-11 17:44:22 UTC
Created attachment 112964 [details]
SysRq Show-State.txt

Comment 14 Georgi Hristov 2005-04-11 17:45:08 UTC
Created attachment 112965 [details]
Process listing (varios outputs of ps)

Comment 15 Georgi Hristov 2005-04-11 17:48:34 UTC
The patch above does not affect my case:

         * The page is in active use or really unfreeable. Move to
         * the active list and adjust the page age if needed.
        if (page_referenced(page, &over_rsslimit) && !over_rsslimit &&
                        page_mapping_inuse(page)) {
                add_page_to_active_list(page, INITIAL_AGE);
                return 0;

The code is different. It seems that I already have the lru_lock/lru_unlock

Comment 16 Ernie Petrides 2005-04-23 00:42:18 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.2.EL).

Comment 37 Red Hat Bugzilla 2005-09-28 14:51:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Comment 38 Ernie Petrides 2006-01-09 22:10:35 UTC
*** Bug 170144 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.