Red Hat Bugzilla – Bug 202955
bogus out-of-memory resulting in ext3 file system corruption
Last modified: 2007-11-30 17:07:27 EST
Description of problem:
I have desktop system with 1GB or RAM and 2GB of swap space. Normal average-day
utilization as displayed by top looks something like:
Mem: 1025040k total, 980140k used, 44900k free, 77224k buffers
Swap: 2097144k total, 473180k used, 1623964k free, 278928k cached
I've started an Perl script (imapsync, for syncing two IMAP accounts), that
allocated about 600MB of memory for some hash (in small chunks). The
application was mostly network bound, working relatively slowly through that
At one point, I got OOM, ext3 module was denied memory, and I got one of my ext3
file systems corrupted as end result. I was able to unmount it, and run fsck on
it, which fixed some things. However, now every time I try to mount it, I get
warning that I should fsck it first (well, I did).
At the time I got OOM, there was more than enough free swap space to accomodate
all the applications on the system, even if all of them had to be swaped out.
It looked like clear failure of VM to utilize the resources (physical memory and
swap space) it had.
There was bunch of messages logged by kernel. I'll place it into attachment.
All the VM kernel parameters were at default values.
Version-Release number of selected component (if applicable):
Not sure if I want to attempt reproducing it, I love my data.
Steps to Reproduce:
Created attachment 134383 [details]
Does anyone know if this is reproducable? The system appears to be in a very
weird state and I would really like to figure out exactly ow it got there !!!
Any help reproducing this problem would be appreciated.
Thanks, Larry Woodman
Aleksandar, is the filesystem still telling you that you should fsck it when you
mount? What are the exact messages? Perhaps an e2image of the filesystem would
help me find out why e2fsck doesn't seem to be able to clear this state.
Well, I've filed the bug report long time ago. In the meantime, I've simply
tarred everything from that file system to the tape, mkfs.ext3 it, and restored
it back. Luckily it was just a few GB of data. Solved the "need to fsck"
warning message problem (but killed all debugging info too, sorry).
I also added 1GB of memory to the system (mostly sits there unused) and set
vm.min_free_kbytes to 8192. It seems that the later is doing a good job of
preventing this kind of thing repeating itself.
Eric, as I wrote earlier, the problem occured while running a Perl script that
was allocating memory in small chunks (total of around 600MB), and than working
on that data. So it could be also that it was a very bad case of memory
fragmentation. Other than allocating 600MB of memory, the system was doing some
relatively heavy network I/O (that Perl script was responsible for that too).
At the time, I was logged in on the console, doing some work in terminal window
(so the system also had to cope with some light desktop load).
Setting min_free_kbytes to 8192 is the correct was to resolve this issue. We
are considering increasing that default in RHEL4-U5.
This change was made to RHEL4-U6.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.