Bug 145795 - Large Shared Memory Allocation + mke2fs == OOM
Summary: Large Shared Memory Allocation + mke2fs == OOM
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 2
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-01-21 17:37 UTC by Hrunting Johnson
Modified: 2015-01-04 22:16 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-16 04:29:47 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Hrunting Johnson 2005-01-21 17:37:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0

Description of problem:
I have an application that allocates about 30 shared memory segments
totaling about 450MB of memory.  This application is running on a box
with 4GB of memory and no swap.  The application has hundreds of
forked children that attach to these shared memory segments and use them.

When this configuration is running, and I then run mke2fs on a
partition, the mke2fs slows to a stop and the kernel starts killing my
application processes.  While it's killing these processes, I have top
running and it shows anywhere from 1-2GB of free memory and 1-2GB of
memory listed as 'cached'.  The 'buffers' sit around 600K.

The only other thing I've changed is vm.vfs_cache_pressure, which is
set to 1000.  Raising or lowering this value doesn't change the
behavior under this scenario.

The one difference I do note is that under top, each process lists its
'VIRT' around 450M (the size of the shm attached).  Normally, the
'SHR' field will also be this value since all of the memory is shared,
and the 'RES' field will be smaller.  Under the new kernel, the VIRT
field remains 450M, but the SHR field is a few MB at most, and is
close to the RES field.  It almost appears as if the shared memory
either isn't maintaining its sharedness or the 'SHR' column data has
changed meaning.

mke2fs runs fine without the shared memory allocation.  The
application runs fine without mke2fs running.

This exact same config works fine under 2.6.9-1.6FC2 and all previous
kernels back to RH9.  I'm placing this under 'kernel' since it really
looks like the kernel has some serious memory management issues,
especially when a huge amount of disk writing takes place very quickly.

I'll try to post a dumbed down test program that can be used to
recreate the behavior.

Version-Release number of selected component (if applicable):
kernel-2.6.10-1.9_FC2

How reproducible:
Always

Steps to Reproduce:
1. create large amount of shared memory
2. attach to it with a large number of processes
3. run mke2fs on a decent sized partition
    

Actual Results:  Box comes to halt.  OOM manager begins killing
processes attached to shared memory, even though free memory exists on
system.

Expected Results:  System hums along not caring.

Additional info:

Comment 1 Hrunting Johnson 2005-01-21 17:56:23 UTC
I forgot one key part to this.  The box is under heavy I/O load
constantly (think news server, but heavier), and that seems to be
another component.

It's still puzzling why processes are being killed when free memory is
available.

Comment 2 Hrunting Johnson 2005-01-21 20:11:09 UTC
Here's the output from the OOM manager:

oom-killer: gfp_mask=0xd0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
cpu 2 hot: low 2, high 6, batch 1
cpu 2 cold: low 0, high 2, batch 1
cpu 3 hot: low 2, high 6, batch 1
cpu 3 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16

Free pages:     2709004kB (2626176kB HighMem)
Active:139314 inactive:159921 dirty:158039 writeback:105 unstable:0
free:677251 slab:26349 mapped:127907 pagetables:18240
DMA free:1004kB min:68kB low:84kB high:100kB active:0kB
inactive:10272kB present:16384kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
Normal free:81648kB min:3756kB low:4692kB high:5632kB active:1908kB
inactive:610696kB present:901120kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:2626176kB min:512kB low:640kB high:768kB active:555356kB
inactive:19060kB present:3407872kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 183*4kB 24*8kB 3*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 1004kB
Normal: 14450*4kB 2009*8kB 303*16kB 82*32kB 2*64kB 0*128kB 0*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 81472kB
HighMem: 7676*4kB 4952*8kB 3683*16kB 4271*32kB 3724*64kB 2393*128kB
1091*256kB 344*512kB 86*1024kB 41*2048kB 290*4096kB = 2625856kB
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap:            0kB
1081344 pages of RAM
819184 pages of HIGHMEM
42660 reserved pages
2375875 pages shared
0 pages swap cached
Out of Memory: Killed process 3130 (xxx).

Comment 3 Dave Jones 2005-04-16 04:29:47 UTC
Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.



Note You need to log in before you can comment on or make changes to this bug.