From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7 Description of problem: * This is not a regression - I have seen it before but could not reproduce, It is very infrequent that it happens. * While running the stress testing suite. The system gets into a state where it could not recover from oom kills. The file system gets mounted as read-only and the system becomes unresponsive. Once the system gets into this state the only thing I could do was power off when powering on the system it goes into single user mode and force the user to do a manual fsck. journal_get_undo_access: No memory for committed data ext3_try_to_allocate_with_rsv: aborting transaction: Out of memory in __ext3_journal_get_undo_access EXT3-fs error (device md1) in ext3_new_block: Out of memory Aborting journal on device md1. ext3_abort called. EXT3-fs error (device md1): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only Version-Release number of selected component (if applicable): kernel-2.6.9-29.EL.smp How reproducible: Sometimes Steps to Reproduce: 1. Using pe2850he run the stress kernel rpm test suite. 2. After a period of time this may or may not happen. Actual Results: * See attached log * Expected Results: system _should_ be able to recover. Additional info: I have several systems in the same about the same configuration. I have never see this issue on the other two systems. The big difference on this system is that we are using software raid level 1. The other systems are not using raid.
Created attachment 123973 [details] /var/log/messages file for issue reported
Strange, but when this happens it appears that kswapd and callers to try_to_free_pages() do not run. No progress reclaiming memory appears to be made. Larry
We've seen this several times aswell but with the U2 kernel (2.6.9-22.0.2smp) The machine config is similar to the initial report but we're using a PERC4/Di hardware RAID controller. The problem showed itself during some very heavy filesystem activity.
Created attachment 124804 [details] /var/log/messages /var/log/messages for my report
Jeff and Tom, are either of you two seeing this problem anymore on RHEL4? Larry Woodman
Larry, I have no see this in quite some time. Jeff
Fixes for the old "kswapd0: page allocation failure. order:0, mode:0x0" were committed to RHEL4 between U3, U4 and U5. Since these changes were committed I dont think we've seen this problem again. Larry Woodman
If I remember correctly the problem went away when we turned off dir_index on the filesystem that caused the problem. This also gave us a vast performance gain for our testcase which consisted of millions of small files managed by the Fedora Object Management system.