From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7
Description of problem:
* This is not a regression - I have seen it before but could not reproduce, It is very infrequent that it happens. *
While running the stress testing suite. The system gets into a state where it could not recover from oom kills. The file system gets mounted as read-only and the system becomes unresponsive.
Once the system gets into this state the only thing I could do was power off
when powering on the system it goes into single user mode and force the user to do a manual fsck.
journal_get_undo_access: No memory for committed data
ext3_try_to_allocate_with_rsv: aborting transaction: Out of memory in __ext3_journal_get_undo_access
EXT3-fs error (device md1) in ext3_new_block: Out of memory
Aborting journal on device md1.
EXT3-fs error (device md1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Using pe2850he run the stress kernel rpm test suite.
2. After a period of time this may or may not happen.
Actual Results: * See attached log *
Expected Results: system _should_ be able to recover.
I have several systems in the same about the same configuration. I have never see this issue on the other two systems. The big difference on this system is that we are using software raid level 1.
The other systems are not using raid.
Created attachment 123973 [details]
/var/log/messages file for issue reported
Strange, but when this happens it appears that kswapd and callers to
try_to_free_pages() do not run. No progress reclaiming memory appears to be made.
We've seen this several times aswell but with the U2 kernel (2.6.9-22.0.2smp)
The machine config is similar to the initial report but we're using a PERC4/Di
hardware RAID controller.
The problem showed itself during some very heavy filesystem activity.
Created attachment 124804 [details]
/var/log/messages for my report
Jeff and Tom, are either of you two seeing this problem anymore on RHEL4?
I have no see this in quite some time.
Fixes for the old "kswapd0: page allocation failure. order:0, mode:0x0" were
committed to RHEL4 between U3, U4 and U5. Since these changes were committed I
dont think we've seen this problem again.
If I remember correctly the problem went away when we turned off dir_index on
the filesystem that caused the problem.
This also gave us a vast performance gain for our testcase which consisted of
millions of small files managed by the Fedora Object Management system.