202955 – bogus out-of-memory resulting in ext3 file system corruption

Bug 202955 - bogus out-of-memory resulting in ext3 file system corruption

Summary: bogus out-of-memory resulting in ext3 file system corruption

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-08-17 14:18 UTC by Aleksandar Milivojevic
Modified:	2007-11-30 22:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:	RHBA-2007-0791
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-15 16:14:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
log file (90.85 KB, text/plain) 2006-08-17 14:18 UTC, Aleksandar Milivojevic	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0791	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6	2007-11-14 18:25:55 UTC

Description Aleksandar Milivojevic 2006-08-17 14:18:48 UTC

Description of problem:
I have desktop system with 1GB or RAM and 2GB of swap space.  Normal average-day
utilization as displayed by top looks something like:

Mem:   1025040k total,   980140k used,    44900k free,    77224k buffers
Swap:  2097144k total,   473180k used,  1623964k free,   278928k cached

I've started an Perl script (imapsync, for syncing two IMAP accounts), that
allocated about 600MB of memory for some hash (in small chunks).  The
application was mostly network bound, working relatively slowly through that
600MB hash.

At one point, I got OOM, ext3 module was denied memory, and I got one of my ext3
file systems corrupted as end result.  I was able to unmount it, and run fsck on
it, which fixed some things.  However, now every time I try to mount it, I get
warning that I should fsck it first (well, I did).

At the time I got OOM, there was more than enough free swap space to accomodate
all the applications on the system, even if all of them had to be swaped out. 
It looked like clear failure of VM to utilize the resources (physical memory and
swap space) it had.

There was bunch of messages logged by kernel.  I'll place it into attachment.

All the VM kernel parameters were at default values.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-34.0.1.EL

How reproducible:
Not sure if I want to attempt reproducing it, I love my data.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Aleksandar Milivojevic 2006-08-17 14:18:48 UTC

Created attachment 134383 [details]
log file

Comment 2 Larry Woodman 2006-11-16 18:49:03 UTC

Does anyone know if this is reproducable?  The system appears to be in a very
weird state and I would really like to figure out exactly ow it got there !!!

Any help reproducing this problem would be appreciated.

Thanks, Larry Woodman

Comment 3 Eric Sandeen 2006-11-16 20:09:21 UTC

Aleksandar, is the filesystem still telling you that you should fsck it when you
mount?  What are the exact messages?  Perhaps an e2image of the filesystem would
help me find out why e2fsck doesn't seem to be able to clear this state.

Comment 4 Aleksandar Milivojevic 2006-11-16 21:11:24 UTC

Well, I've filed the bug report long time ago.  In the meantime, I've simply
tarred everything from that file system to the tape, mkfs.ext3 it, and restored
it back.  Luckily it was just a few GB of data.  Solved the "need to fsck"
warning message problem (but killed all debugging info too, sorry).

I also added 1GB of memory to the system (mostly sits there unused) and set
vm.min_free_kbytes to 8192.  It seems that the later is doing a good job of
preventing this kind of thing repeating itself.

Comment 5 Aleksandar Milivojevic 2006-11-16 21:26:31 UTC

Eric, as I wrote earlier, the problem occured while running a Perl script that
was allocating memory in small chunks (total of around 600MB), and than working
on that data.  So it could be also that it was a very bad case of memory
fragmentation.  Other than allocating 600MB of memory, the system was doing some
relatively heavy network I/O (that Perl script was responsible for that too). 
At the time, I was logged in on the console, doing some work in terminal window
(so the system also had to cope with some light desktop load).

Comment 6 Larry Woodman 2006-12-08 18:30:15 UTC

Setting min_free_kbytes to 8192 is the correct was to resolve this issue.  We
are considering increasing that default in RHEL4-U5.

Larry Woodman

Comment 7 Larry Woodman 2007-07-10 15:45:29 UTC

This change was made to RHEL4-U6.

Larry Woodman

Comment 12 errata-xmlrpc 2007-11-15 16:14:54 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html

Note You need to log in before you can comment on or make changes to this bug.