From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2 Description of problem: When running a customized version of ctcs (http://sourceforge.net/projects/va-ctcs/), we have been able to reproduce this problem reliably. The problem has been observed on both hardware platforms we have tested with: Intel i7501 with 2 2.4Ghz Xeon processors and Intel i7520 with 2 3.2Ghz Xeon processors. The problem only occurs on these boxes if they have 8GB of DRAM. The same boxes with 6GB or less ram do not demonstrate the problem. It appears that having more ethernet interfaces makes the problem more prone to happening. The interfaces do not have to be cabled or configured; the cards just need to be in the box. I have tried running with the hugemem kernel and the problem is much less likely to occur, but I have still seen one instance of a failure using the hugemem kernel. I had to compile the hugemem kernel for the i7520 boxes because they have 3ware controllers in them and the hugemem kernel does not include drivers for this card. Version-Release number of selected component (if applicable): kernel-smp-2.4.21-27.0.2.EL How reproducible: Always Steps to Reproduce: 1. Configure box with 8GB of DRAM 2. Run our custom "burnin" profile of ctcs 3. wait for failure Actual Results: Eventually, it appears the box is hung. However, magic-sysrq is operational. Expected Results: The tests should run without error. Additional info:
Created attachment 112268 [details] sysrq output from i7520 box with 8GB DRAM, 2GB swap
Created attachment 112269 [details] sysrq from i7520 server with 8GB DRAM and 16GB swap
Note, beyond these 2 sysrq logs we have 19 other logs running various RHEL 3 derivative kernels. I have examined all these logs, and have found the following similarities: - Of the 4 processors, in Show CPU's, one processor is always running __free_pages_ok() for kswapd, and another processor is running __free_pages_ok()for do_group_exit(). - The memtst process is always in do_group_exit(). memtst is a C program that allocates memory and does various reads and writes accross this memory with various bit patterns. - All processors have called .text.lock.swap. - kswapd is always calling launder_page() in its call chain (actually the call chain always looks the same except for try_to_free_buffers() is sometimes in between launder_page() and __free_pages_ok(). Also, though the box from a human perspective appears hung (you will not be able to log in at this moment, and no new output seems to come from user land processes), interupts are still serviced (as can be seen in some of the sysreq logs) and printk's from the kernel still seem to make their way to the console. Cheers...james
Created attachment 112273 [details] another sysrq output from i7520 servver with 8GB DRAM, 2GB swap
Could you please get sysrq-W output (like sysrq-P, but from all CPUs) of a hung server ? It would be useful to know what the other CPUs are doing, if this is a locking deadlock.
Btw, the reasons I suspect this is a locking deadlock: 1) there is lots of memory free, so it's not a low memory deadlock 2) the currently running task (on one CPU) is trying to grab a lock 3) there are many runnable processes that have schedule_timeout() as the top function in the stack - which means they all got woken up at some point in time, but never actually got to run
Oh n/m - sysrq-w is attached near the bottom of the tracebacks. Furthermore, it appears that all 4 CPUs are spinning on the same spinlock, but I don't see any non-running task holding the lock. The good news is that it should be relatively simple to reproduce in-house and take a crashdump...
I see the problem here, launder_page has an unlikely race in which it calls page_cache_release() with the zone->lru_lock held. If the other CPUs release the last page references to this page it can call __free_pages_ok() which trys to take the zone->lru_lock. This patch fixes the problem: ----------------------------------------------------------------------------- --- linux-2.4.21/mm/vmscan.c.orig +++ linux-2.4.21/mm/vmscan.c @@ -315,7 +315,9 @@ int launder_page(zone_t * zone, int gfp_ if (cache_ratio(zone) > cache_limits.max && page_anon(page) && free_min(zone) < 0) { add_page_to_active_list(page, INITIAL_AGE); + lru_unlock(zone); page_cache_release(page); + lru_lock(zone); return 0; } ------------------------------------------------------------------------------ Larry
Sweet! I have this running on two servers with 8GB running with this patch, so far they have run for 16 hours and 14.5 hours respectively. The box that has run for 14.5 hours normally failed in less than 2, so I feel pretty confident that this takes care of the problem. Thank you! I'll run these servers for a while longer to make sure the problem is indeed solved. David
I have similar problem with HP DL580 -- 4x 3.0Ghz Xeon, 8GB RAM, 2x MSA30-DB external scsi storage. The system crawls under heavy I/O load. [root@drtsut10 mm]# uname -a Linux drtsut10.corp.acxiom.net 2.4.21-20.ELsmp #1 SMP Wed Aug 18 20:46:40 EDT 2004 i686 i686 i386 GNU/Linux [root@drtsut10 mm]# cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 8165167104 8142745600 22421504 0 448823296 7425323008 Swap: 4186226688 4096 4186222592 MemTotal: 7973796 kB MemFree: 21896 kB MemShared: 0 kB Buffers: 438304 kB Cached: 7251288 kB SwapCached: 4 kB Active: 645964 kB ActiveAnon: 20980 kB ActiveCache: 624984 kB Inact_dirty: 5622108 kB Inact_laundry: 936376 kB Inact_clean: 507424 kB Inact_target: 1542372 kB HighTotal: 7208924 kB HighFree: 1020 kB LowTotal: 764872 kB LowFree: 20876 kB SwapTotal: 4088112 kB SwapFree: 4088108 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB See attachments for SysRQ-T, SysRQ-M, SysRQ-W, and process listing (ps)
Created attachment 112962 [details] SysRq Show-CPUs.txt
Created attachment 112963 [details] SysRq Show-Memory.txt
Created attachment 112964 [details] SysRq Show-State.txt
Created attachment 112965 [details] Process listing (varios outputs of ps)
The patch above does not affect my case: /* * The page is in active use or really unfreeable. Move to * the active list and adjust the page age if needed. */ pte_chain_lock(page); if (page_referenced(page, &over_rsslimit) && !over_rsslimit && page_mapping_inuse(page)) { del_page_from_inactive_laundry_list(page); add_page_to_active_list(page, INITIAL_AGE); pte_chain_unlock(page); UnlockPage(page); lru_unlock(zone); page_cache_release(page); lru_lock(zone); return 0; } The code is different. It seems that I already have the lru_lock/lru_unlock statements.
A fix for this problem has just been committed to the RHEL3 U6 patch pool this evening (in kernel version 2.4.21-32.2.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-663.html
*** Bug 170144 has been marked as a duplicate of this bug. ***