Bug 202955 - bogus out-of-memory resulting in ext3 file system corruption
Summary: bogus out-of-memory resulting in ext3 file system corruption
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-08-17 14:18 UTC by Aleksandar Milivojevic
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-15 16:14:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log file (90.85 KB, text/plain)
2006-08-17 14:18 UTC, Aleksandar Milivojevic
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0791 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6 2007-11-14 18:25:55 UTC

Description Aleksandar Milivojevic 2006-08-17 14:18:48 UTC
Description of problem:
I have desktop system with 1GB or RAM and 2GB of swap space.  Normal average-day
utilization as displayed by top looks something like:

Mem:   1025040k total,   980140k used,    44900k free,    77224k buffers
Swap:  2097144k total,   473180k used,  1623964k free,   278928k cached

I've started an Perl script (imapsync, for syncing two IMAP accounts), that
allocated about 600MB of memory for some hash (in small chunks).  The
application was mostly network bound, working relatively slowly through that
600MB hash.

At one point, I got OOM, ext3 module was denied memory, and I got one of my ext3
file systems corrupted as end result.  I was able to unmount it, and run fsck on
it, which fixed some things.  However, now every time I try to mount it, I get
warning that I should fsck it first (well, I did).

At the time I got OOM, there was more than enough free swap space to accomodate
all the applications on the system, even if all of them had to be swaped out. 
It looked like clear failure of VM to utilize the resources (physical memory and
swap space) it had.

There was bunch of messages logged by kernel.  I'll place it into attachment.

All the VM kernel parameters were at default values.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-34.0.1.EL

How reproducible:
Not sure if I want to attempt reproducing it, I love my data.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Aleksandar Milivojevic 2006-08-17 14:18:48 UTC
Created attachment 134383 [details]
log file

Comment 2 Larry Woodman 2006-11-16 18:49:03 UTC
Does anyone know if this is reproducable?  The system appears to be in a very
weird state and I would really like to figure out exactly ow it got there !!!

Any help reproducing this problem would be appreciated.

Thanks, Larry Woodman


Comment 3 Eric Sandeen 2006-11-16 20:09:21 UTC
Aleksandar, is the filesystem still telling you that you should fsck it when you
mount?  What are the exact messages?  Perhaps an e2image of the filesystem would
help me find out why e2fsck doesn't seem to be able to clear this state.

Comment 4 Aleksandar Milivojevic 2006-11-16 21:11:24 UTC
Well, I've filed the bug report long time ago.  In the meantime, I've simply
tarred everything from that file system to the tape, mkfs.ext3 it, and restored
it back.  Luckily it was just a few GB of data.  Solved the "need to fsck"
warning message problem (but killed all debugging info too, sorry).

I also added 1GB of memory to the system (mostly sits there unused) and set
vm.min_free_kbytes to 8192.  It seems that the later is doing a good job of
preventing this kind of thing repeating itself.


Comment 5 Aleksandar Milivojevic 2006-11-16 21:26:31 UTC
Eric, as I wrote earlier, the problem occured while running a Perl script that
was allocating memory in small chunks (total of around 600MB), and than working
on that data.  So it could be also that it was a very bad case of memory
fragmentation.  Other than allocating 600MB of memory, the system was doing some
relatively heavy network I/O (that Perl script was responsible for that too). 
At the time, I was logged in on the console, doing some work in terminal window
(so the system also had to cope with some light desktop load).

Comment 6 Larry Woodman 2006-12-08 18:30:15 UTC
Setting min_free_kbytes to 8192 is the correct was to resolve this issue.  We
are considering increasing that default in RHEL4-U5.

Larry Woodman


Comment 7 Larry Woodman 2007-07-10 15:45:29 UTC
This change was made to RHEL4-U6.

Larry Woodman


Comment 12 errata-xmlrpc 2007-11-15 16:14:54 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html



Note You need to log in before you can comment on or make changes to this bug.