Bug 179605 - journal_get_undo_access: No memory for committed data
Summary: journal_get_undo_access: No memory for committed data
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 4.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-02-01 16:55 UTC by Jeff Burke
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version: RHEL4-U5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-07-10 18:55:06 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages file for issue reported (21.02 KB, text/plain)
2006-02-01 16:57 UTC, Jeff Burke
no flags Details
/var/log/messages (38.24 KB, application/octet-stream)
2006-02-17 10:18 UTC, Tom G. Christensen
no flags Details

Description Jeff Burke 2006-02-01 16:55:23 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:
* This is not a regression - I have seen it before but could not reproduce, It is very infrequent that it happens. *

While running the stress testing suite. The system gets into a state where it could not recover from oom kills. The file system gets mounted as read-only and the system becomes unresponsive.

Once the system gets into this state the only thing I could do was power off
when powering on the system it goes into single user mode and force the user to do a manual fsck.

journal_get_undo_access: No memory for committed data
ext3_try_to_allocate_with_rsv: aborting transaction: Out of memory in __ext3_journal_get_undo_access
EXT3-fs error (device md1) in ext3_new_block: Out of memory
Aborting journal on device md1.
ext3_abort called.
EXT3-fs error (device md1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only


Version-Release number of selected component (if applicable):
kernel-2.6.9-29.EL.smp

How reproducible:
Sometimes

Steps to Reproduce:
1. Using pe2850he run the stress kernel rpm test suite.
2. After a period of time this may or may not happen.

  

Actual Results:  * See attached log *

Expected Results:  system _should_ be able to recover.

Additional info:

I have several systems in the same about the same configuration. I have never see this issue on the other two systems. The big difference on this system is that we are using software raid level 1.

The other systems are not using raid.

Comment 1 Jeff Burke 2006-02-01 16:57:36 UTC
Created attachment 123973 [details]
/var/log/messages file for issue reported

Comment 2 Larry Woodman 2006-02-01 18:16:24 UTC
Strange, but when this happens it appears that kswapd and callers to
try_to_free_pages() do not run.  No progress reclaiming memory appears to be made.

Larry


Comment 3 Tom G. Christensen 2006-02-17 10:15:34 UTC
We've seen this several times aswell but with the U2 kernel (2.6.9-22.0.2smp)
The machine config is similar to the initial report but we're using a PERC4/Di
hardware RAID controller.
The problem showed itself during some very heavy filesystem activity.

Comment 4 Tom G. Christensen 2006-02-17 10:18:13 UTC
Created attachment 124804 [details]
/var/log/messages

/var/log/messages for my report

Comment 5 Larry Woodman 2007-07-10 17:47:10 UTC
Jeff and Tom, are either of you two seeing this problem anymore on RHEL4?

Larry Woodman


Comment 6 Jeff Burke 2007-07-10 18:43:55 UTC
Larry,
   I have no see this in quite some time.
Jeff

Comment 7 Larry Woodman 2007-07-10 18:55:06 UTC
Fixes for the old "kswapd0: page allocation failure. order:0, mode:0x0" were
committed to RHEL4 between U3, U4 and U5.  Since these changes were committed I
dont think we've seen this problem again.

Larry Woodman


Comment 8 Tom G. Christensen 2007-07-11 06:18:48 UTC
If I remember correctly the problem went away when we turned off dir_index on
the filesystem that caused the problem.
This also gave us a vast performance gain for our testcase which consisted of
millions of small files managed by the Fedora Object Management system.


Note You need to log in before you can comment on or make changes to this bug.