Bug 179605 - journal_get_undo_access: No memory for committed data
journal_get_undo_access: No memory for committed data
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.3
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-01 11:55 EST by Jeff Burke
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHEL4-U5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-07-10 14:55:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages file for issue reported (21.02 KB, text/plain)
2006-02-01 11:57 EST, Jeff Burke
no flags Details
/var/log/messages (38.24 KB, application/octet-stream)
2006-02-17 05:18 EST, Tom G. Christensen
no flags Details

  None (edit)
Description Jeff Burke 2006-02-01 11:55:23 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:
* This is not a regression - I have seen it before but could not reproduce, It is very infrequent that it happens. *

While running the stress testing suite. The system gets into a state where it could not recover from oom kills. The file system gets mounted as read-only and the system becomes unresponsive.

Once the system gets into this state the only thing I could do was power off
when powering on the system it goes into single user mode and force the user to do a manual fsck.

journal_get_undo_access: No memory for committed data
ext3_try_to_allocate_with_rsv: aborting transaction: Out of memory in __ext3_journal_get_undo_access
EXT3-fs error (device md1) in ext3_new_block: Out of memory
Aborting journal on device md1.
ext3_abort called.
EXT3-fs error (device md1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only


Version-Release number of selected component (if applicable):
kernel-2.6.9-29.EL.smp

How reproducible:
Sometimes

Steps to Reproduce:
1. Using pe2850he run the stress kernel rpm test suite.
2. After a period of time this may or may not happen.

  

Actual Results:  * See attached log *

Expected Results:  system _should_ be able to recover.

Additional info:

I have several systems in the same about the same configuration. I have never see this issue on the other two systems. The big difference on this system is that we are using software raid level 1.

The other systems are not using raid.
Comment 1 Jeff Burke 2006-02-01 11:57:36 EST
Created attachment 123973 [details]
/var/log/messages file for issue reported
Comment 2 Larry Woodman 2006-02-01 13:16:24 EST
Strange, but when this happens it appears that kswapd and callers to
try_to_free_pages() do not run.  No progress reclaiming memory appears to be made.

Larry
Comment 3 Tom G. Christensen 2006-02-17 05:15:34 EST
We've seen this several times aswell but with the U2 kernel (2.6.9-22.0.2smp)
The machine config is similar to the initial report but we're using a PERC4/Di
hardware RAID controller.
The problem showed itself during some very heavy filesystem activity.
Comment 4 Tom G. Christensen 2006-02-17 05:18:13 EST
Created attachment 124804 [details]
/var/log/messages

/var/log/messages for my report
Comment 5 Larry Woodman 2007-07-10 13:47:10 EDT
Jeff and Tom, are either of you two seeing this problem anymore on RHEL4?

Larry Woodman
Comment 6 Jeff Burke 2007-07-10 14:43:55 EDT
Larry,
   I have no see this in quite some time.
Jeff
Comment 7 Larry Woodman 2007-07-10 14:55:06 EDT
Fixes for the old "kswapd0: page allocation failure. order:0, mode:0x0" were
committed to RHEL4 between U3, U4 and U5.  Since these changes were committed I
dont think we've seen this problem again.

Larry Woodman
Comment 8 Tom G. Christensen 2007-07-11 02:18:48 EDT
If I remember correctly the problem went away when we turned off dir_index on
the filesystem that caused the problem.
This also gave us a vast performance gain for our testcase which consisted of
millions of small files managed by the Fedora Object Management system.

Note You need to log in before you can comment on or make changes to this bug.