Bug 663188

Summary:

OOM KIller invoked when copying files between ramdisks

Product:

Red Hat Enterprise Linux 6

Reporter:

Jeff Moyer <jmoyer>

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED NOTABUG

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

medium

Docs Contact:

Priority:

low

Version:

6.0

CC:

esandeen, lwoodman, rwheeler

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-02-03 16:51:17 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
dmesg output	none
/proc/slabinfo	none
/proc/meminfo	none
copyten.csh	none
copyten.main	none

Description Jeff Moyer 2010-12-14 22:14:19 UTC

Created attachment 468712 [details]
dmesg output

Description of problem:

On a system with 4GB of memory, we end up with OOM kills, and 443178 active objects in the avtab_node.


Version-Release number of selected component (if applicable):
kernel-2.6.32-89.el6.x86_64

How reproducible:
Not sure

Steps to Reproduce:

ramdisk settings:

ramdisk_size=11240000 ramdisk_blocksize=4096

(on this 4g box, oops)

(need copyten.csh & copyten.main in /root/copten)

# Prepare files for copying
mkdir -p /files/tenfiles
for I in 205265992 43076496 33975530 45280566 12828649 8482669 46831855 14404182 22119710 103650696;
  do fallocate -l $I /files/tenfiles/file-$I;
done

mkdir /lvdsk
mkdir /mrdsk

mkfs.ext2 /dev/ram1
mount /dev/ram1 /mnt/test

cd /root/copyten
./copyten.csh -vv /mnt/test

  
Actual results:
OOM killer kills processes, and the system never recovers

Expected results:
No leakage.

Additional info:
I'll attach the shell scripts, /proc/meminfo, /proc/slabinfo and dmesg output.

Comment 1 Jeff Moyer 2010-12-14 22:15:00 UTC

Created attachment 468713 [details]
/proc/slabinfo

Comment 2 Jeff Moyer 2010-12-14 22:15:21 UTC

Created attachment 468714 [details]
/proc/meminfo

Comment 3 Jeff Moyer 2010-12-14 22:17:04 UTC

Created attachment 468715 [details]
copyten.csh

Comment 4 Jeff Moyer 2010-12-14 22:17:20 UTC

Created attachment 468716 [details]
copyten.main

Comment 6 Jeff Moyer 2010-12-14 22:30:16 UTC

Larry, would you mind taking a look at this to see if you can figure out where the memory has gone?

Comment 7 Eric Sandeen 2010-12-14 22:55:46 UTC

This actually copied from an ext2 fs on /dev/ram0 (created by the script) to another one on /dev/ram1.

Comment 8 Larry Woodman 2010-12-15 03:45:48 UTC

I think you need swap space to back up a ramdisk...

Larry

Comment 9 Jeff Moyer 2010-12-15 14:21:37 UTC

OK, so it's a misconfiguration?  I'm just surprised that after the ramdisks were unmounted, the memory was still not reclaimable.  Does that sound right?

Comment 10 RHEL Program Management 2011-01-07 04:20:27 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 11 Suzanne Logcher 2011-01-07 16:05:43 UTC

This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 12 RHEL Program Management 2011-02-01 05:52:16 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 13 Ric Wheeler 2011-02-01 14:53:12 UTC

Should we close this as notabug?

Comment 14 Jeff Moyer 2011-02-01 14:58:33 UTC

(In reply to comment #13)
> Should we close this as notabug?

No, I'd like Larry's opinion on why the memory remained unreclaimable.  See comment number 9.

Comment 15 RHEL Program Management 2011-02-01 15:08:31 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 16 Larry Woodman 2011-02-01 15:13:17 UTC

I couldnt reproduce this when I had swap space late last year.  I'll try again.

Larry

Comment 17 RHEL Program Management 2011-02-01 18:59:58 UTC

This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 18 Larry Woodman 2011-02-01 21:16:14 UTC

OK, I finally did reproduce this using both ramfs and tmpfs with no swap space.  In both cases the pagecache memory for these filesystems is not reclaimable so if you overcommit memory with them the reclaim code moves all the pagecache pages from the active file LRU to the inactive file LRU list and then to the unevictable list because its essentially wired.  Also, they are not visible via /proc/meminfo.  Since there is no memory left the system OOMkills processes and still makes no progress freeing memory because its all unevictable.

<about 3GB used by ramfs pages are in pagecache>
active_anon:30852 inactive_anon:532 isolated_anon:0
active_file:23171 inactive_file:626164 isolated_file:0
unevictable:0 dirty:25 writeback:0 unstable:0
free:227598 slab_reclaimable:6504 slab_unreclaimable:22340

<after overcommitting all memory pages are moved to unevictable list>
active_anon:3499 inactive_anon:8877 isolated_anon:0
active_file:1216 inactive_file:3829 isolated_file:0
unevictable:524288 dirty:4 writeback:0 unstable:0
free:371127 slab_reclaimable:2859 slab_unreclaimable:22517


When I unmount the filesystem all the pages are moved from the unevictable list back to the free list but the system can kill just about every process before in an attempt to reclaim memory.

<after unmounting ramfs filesystem pages are removed from unevictable & freed>
active_anon:3549 inactive_anon:10305 isolated_anon:0
active_file:1279 inactive_file:4873 isolated_file:0
unevictable:0 dirty:10 writeback:0 unstable:0
free:893840 slab_reclaimable:1648 slab_unreclaimable:22334

The bottom line is I dont think we can support using either ramfs or tmpfs with no swap space and over-committing memory with those file systems.  In other words dont use the size=4G mount option on a 4GB system and expect it to work correctly if you use everything.


What do you think Jeff???

Larry

Comment 19 Larry Woodman 2011-02-01 22:01:04 UTC

Actually after looking at your dmesg output there is nothing on the unevictable list.  I cant reproduce that behavior, can you?

Larry

Comment 20 Jeff Moyer 2011-02-01 22:25:05 UTC

(In reply to comment #19)
> Actually after looking at your dmesg output there is nothing on the unevictable
> list.  I cant reproduce that behavior, can you?

I'll give it another try tomorrow and update the bug.  Thanks for looking into this, Larry!

Comment 21 Jeff Moyer 2011-02-03 16:51:17 UTC

OK, after talking with Larry, we agree that this is just the result of a misconfiguration.  Don't do that.

Comment 22 Larry Woodman 2011-02-03 16:59:11 UTC

Specifically the ramdisk driver allocates the pages for the ramdisk using alloc_pages and holds them in a private cache forever(until the system reboots).  Since this example overcommits the RAM in 2 ramdisks memory is exhausted and the system OOMkills everything until it finally panics because there is nothing else to OOMkill. 

Bottom line is you cant overcommit RAM using ramdisk, ramfs or tmpfs with no swap space or the system will OOMkill everything until it panics or hangs.

Larry