Description of problem: I have desktop system with 1GB or RAM and 2GB of swap space. Normal average-day utilization as displayed by top looks something like: Mem: 1025040k total, 980140k used, 44900k free, 77224k buffers Swap: 2097144k total, 473180k used, 1623964k free, 278928k cached I've started an Perl script (imapsync, for syncing two IMAP accounts), that allocated about 600MB of memory for some hash (in small chunks). The application was mostly network bound, working relatively slowly through that 600MB hash. At one point, I got OOM, ext3 module was denied memory, and I got one of my ext3 file systems corrupted as end result. I was able to unmount it, and run fsck on it, which fixed some things. However, now every time I try to mount it, I get warning that I should fsck it first (well, I did). At the time I got OOM, there was more than enough free swap space to accomodate all the applications on the system, even if all of them had to be swaped out. It looked like clear failure of VM to utilize the resources (physical memory and swap space) it had. There was bunch of messages logged by kernel. I'll place it into attachment. All the VM kernel parameters were at default values. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-34.0.1.EL How reproducible: Not sure if I want to attempt reproducing it, I love my data. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 134383 [details] log file
Does anyone know if this is reproducable? The system appears to be in a very weird state and I would really like to figure out exactly ow it got there !!! Any help reproducing this problem would be appreciated. Thanks, Larry Woodman
Aleksandar, is the filesystem still telling you that you should fsck it when you mount? What are the exact messages? Perhaps an e2image of the filesystem would help me find out why e2fsck doesn't seem to be able to clear this state.
Well, I've filed the bug report long time ago. In the meantime, I've simply tarred everything from that file system to the tape, mkfs.ext3 it, and restored it back. Luckily it was just a few GB of data. Solved the "need to fsck" warning message problem (but killed all debugging info too, sorry). I also added 1GB of memory to the system (mostly sits there unused) and set vm.min_free_kbytes to 8192. It seems that the later is doing a good job of preventing this kind of thing repeating itself.
Eric, as I wrote earlier, the problem occured while running a Perl script that was allocating memory in small chunks (total of around 600MB), and than working on that data. So it could be also that it was a very bad case of memory fragmentation. Other than allocating 600MB of memory, the system was doing some relatively heavy network I/O (that Perl script was responsible for that too). At the time, I was logged in on the console, doing some work in terminal window (so the system also had to cope with some light desktop load).
Setting min_free_kbytes to 8192 is the correct was to resolve this issue. We are considering increasing that default in RHEL4-U5. Larry Woodman
This change was made to RHEL4-U6. Larry Woodman
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html