Bug 402861 - OOM-killer on hp-dl380g5-01 when doing disk IO
Summary: OOM-killer on hp-dl380g5-01 when doing disk IO
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-11-28 14:50 UTC by Prarit Bhargava
Modified: 2007-12-18 20:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-12-18 19:59:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Prarit Bhargava 2007-11-28 14:50:21 UTC
Description of problem:

When comping the kernel-rt package the system runs out of memory and the
OOM-killer wakes up.

Version-Release number of selected component (if applicable):  2.6.18-53.el5


How reproducible: 100%


Steps to Reproduce:
1.  Compile the kernel-rt package with (make -j oldconfig; make -j; make -j modules)
  
Actual results:

 sh invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Call Trace:
 [<ffffffff800bed05>] out_of_memory+0x8e/0x2f5
 [<ffffffff8000f071>] __alloc_pages+0x22b/0x2b4
 [<ffffffff80012720>] __do_page_cache_readahead+0x95/0x1d9
 [<ffffffff800618e1>] __wait_on_bit_lock+0x5b/0x66
 [<ffffffff88101c61>] :dm_mod:dm_any_congested+0x38/0x3f
 [<ffffffff800130ab>] filemap_nopage+0x148/0x322
 [<ffffffff800087ed>] __handle_mm_fault+0x1f8/0xdf4
 [<ffffffff80064a6a>] do_page_fault+0x4b8/0x81d
 [<ffffffff8000df48>] free_pages_and_swap_cache+0x73/0x8f
 [<ffffffff8005bde9>] error_exit+0x0/0x84

Node 0 DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
cpu 4 hot: high 0, batch 1 used:0
cpu 4 cold: high 0, batch 1 used:0
cpu 5 hot: high 0, batch 1 used:0
cpu 5 cold: high 0, batch 1 used:0
cpu 6 hot: high 0, batch 1 used:0
cpu 6 cold: high 0, batch 1 used:0
cpu 7 hot: high 0, batch 1 used:0
cpu 7 cold: high 0, batch 1 used:0
Node 0 DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:8
cpu 0 cold: high 62, batch 15 used:57
cpu 1 hot: high 186, batch 31 used:18
cpu 1 cold: high 62, batch 15 used:49
cpu 2 hot: high 186, batch 31 used:166
cpu 2 cold: high 62, batch 15 used:60
cpu 3 hot: high 186, batch 31 used:22
cpu 3 cold: high 62, batch 15 used:46
cpu 4 hot: high 186, batch 31 used:35
cpu 4 cold: high 62, batch 15 used:49
cpu 5 hot: high 186, batch 31 used:30
cpu 5 cold: high 62, batch 15 used:41
cpu 6 hot: high 186, batch 31 used:43
cpu 6 cold: high 62, batch 15 used:17
cpu 7 hot: high 186, batch 31 used:35
cpu 7 cold: high 62, batch 15 used:27
Node 0 Normal per-cpu: empty
Node 0 HighMem per-cpu: empty
Free pages:        7924kB (0kB HighMem)
Active:227438 inactive:194210 dirty:0 writeback:0 unstable:0 free:1981
slab:15376 mapped-file:981 mapped-anon:434527 pagetables:31838
Node 0 DMA free:2268kB min:28kB low:32kB high:40kB active:0kB inactive:0kB
present:10840kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2004 2004 2004
Node 0 DMA32 free:5656kB min:5712kB low:7140kB high:8568kB active:907872kB
inactive:779316kB present:2052100kB pages_scanned:3802449 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB 4*8kB 3*16kB 4*32kB 2*64kB 3*128kB 2*256kB 0*512kB 1*1024kB
0*2048kB 0*4096kB = 2268kB
Node 0 DMA32: 12*4kB 1*8kB 2*16kB 0*32kB 3*64kB 0*128kB 1*256kB 0*512kB 1*1024kB
0*2048kB 1*4096kB = 5656kB
Node 0 Normal: empty
Node 0 HighMem: empty
Swap cache: add 553474, delete 553220, find 4842/9355, race 4+8
Free swap  = 0kB
Total swap = 2031608kB
sh invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0

Call Trace:
 [<ffffffff800bed05>] out_of_memory+0x8e/0x2f5
 [<ffffffff8000f071>] __alloc_pages+0x22b/0x2b4
 [<ffffffff800170b7>] cache_grow+0x137/0x395
 [<ffffffff8005a53f>] cache_alloc_refill+0x136/0x186
 [<ffffffff8000a865>] kmem_cache_alloc+0x6c/0x76
 [<ffffffff8001e772>] copy_process+0xa6/0x155d
 [<ffffffff800991a5>] alloc_pid+0x1ee/0x28a
 [<ffffffff80030c1b>] do_fork+0x68/0x187
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0
 [<ffffffff8005b427>] ptregscall_common+0x67/0xac

Node 0 DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
cpu 4 hot: high 0, batch 1 used:0
cpu 4 cold: high 0, batch 1 used:0
cpu 5 hot: high 0, batch 1 used:0
cpu 5 cold: high 0, batch 1 used:0
cpu 6 hot: high 0, batch 1 used:0
cpu 6 cold: high 0, batch 1 used:0
cpu 7 hot: high 0, batch 1 used:0
cpu 7 cold: high 0, batch 1 used:0
Node 0 DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:0
cpu 0 cold: high 62, batch 15 used:8
cpu 1 hot: high 186, batch 31 used:27
cpu 1 cold: high 62, batch 15 used:13
cpu 2 hot: high 186, batch 31 used:158
cpu 2 cold: high 62, batch 15 used:48
cpu 3 hot: high 186, batch 31 used:11
cpu 3 cold: high 62, batch 15 used:57
cpu 4 hot: high 186, batch 31 used:43
cpu 4 cold: high 62, batch 15 used:61
cpu 5 hot: high 186, batch 31 used:24
cpu 5 cold: high 62, batch 15 used:53
cpu 6 hot: high 186, batch 31 used:48
cpu 6 cold: high 62, batch 15 used:31
cpu 7 hot: high 186, batch 31 used:24
cpu 7 cold: high 62, batch 15 used:33
Node 0 Normal per-cpu: empty
Node 0 HighMem per-cpu: empty
Free pages:        8108kB (0kB HighMem)
Active:238898 inactive:183426 dirty:0 writeback:0 unstable:0 free:2027
slab:15305 mapped-file:979 mapped-anon:434547 pagetables:31849
Node 0 DMA free:2268kB min:28kB low:32kB high:40kB active:0kB inactive:0kB
present:10840kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2004 2004 2004
Node 0 DMA32 free:5840kB min:5712kB low:7140kB high:8568kB active:956620kB
inactive:732652kB present:2052100kB pages_scanned:80192 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB 4*8kB 3*16kB 4*32kB 2*64kB 3*128kB 2*256kB 0*512kB 1*1024kB
0*2048kB 0*4096kB = 2268kB
Node 0 DMA32: 40*4kB 2*8kB 2*16kB 0*32kB 3*64kB 0*128kB 1*256kB 0*512kB 1*1024kB
0*2048kB 1*4096kB = 5776kB
Node 0 Normal: empty
Node 0 HighMem: empty
Swap cache: add 553482, delete 553227, find 4842/9357, race 4+8
Free swap  = 0kB
Total swap = 2031608kB
Out of memory: Killed process 3600 (sh).
irqbalance invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0

Call Trace:
 [<ffffffff800bed05>] out_of_memory+0x8e/0x2f5
 [<ffffffff8000f071>] __alloc_pages+0x22b/0x2b4
 [<ffffffff8003bba6>] __get_free_pages+0xe/0x71
 [<ffffffff8011de2b>] selinux_proc_get_sid+0x27/0xbc
 [<ffffffff8011e29b>] inode_doinit_with_dentry+0x3db/0x47c
 [<ffffffff8002ce2a>] wake_up_bit+0x11/0x22
 [<ffffffff800273f0>] proc_lookup+0x85/0xcc
 [<ffffffff80053f64>] proc_root_lookup+0x12/0x30
 [<ffffffff8000ca32>] do_lookup+0xce/0x1c3
 [<ffffffff80009ee9>] __link_path_walk+0xa01/0xf42
 [<ffffffff8000e5b9>] link_path_walk+0x5c/0xe5
 [<ffffffff8000c7e8>] do_path_lookup+0x270/0x2e8
 [<ffffffff800232cf>] __path_lookup_intent_open+0x56/0x97
 [<ffffffff8001a778>] open_namei+0x83/0x6fd
 [<ffffffff80064a9d>] do_page_fault+0x4eb/0x81d
 [<ffffffff800270ce>] do_filp_open+0x1c/0x38
 [<ffffffff80019523>] do_sys_open+0x44/0xbe
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0

Node 0 DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
cpu 4 hot: high 0, batch 1 used:0
cpu 4 cold: high 0, batch 1 used:0
cpu 5 hot: high 0, batch 1 used:0
cpu 5 cold: high 0, batch 1 used:0
cpu 6 hot: high 0, batch 1 used:0
cpu 6 cold: high 0, batch 1 used:0
cpu 7 hot: high 0, batch 1 used:0
cpu 7 cold: high 0, batch 1 used:0
Node 0 DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:25
cpu 0 cold: high 62, batch 15 used:40
cpu 1 hot: high 186, batch 31 used:30
cpu 1 cold: high 62, batch 15 used:30
cpu 2 hot: high 186, batch 31 used:175
cpu 2 cold: high 62, batch 15 used:49
cpu 3 hot: high 186, batch 31 used:29
cpu 3 cold: high 62, batch 15 used:46
cpu 4 hot: high 186, batch 31 used:152
cpu 4 cold: high 62, batch 15 used:60
cpu 5 hot: high 186, batch 31 used:24
cpu 5 cold: high 62, batch 15 used:61
cpu 6 hot: high 186, batch 31 used:50
cpu 6 cold: high 62, batch 15 used:49
cpu 7 hot: high 186, batch 31 used:12
cpu 7 cold: high 62, batch 15 used:20
Node 0 Normal per-cpu: empty
Node 0 HighMem per-cpu: empty
Free pages:        7820kB (0kB HighMem)
Active:143024 inactive:268831 dirty:0 writeback:0 unstable:0 free:1955
slab:15155 mapped-file:972 mapped-anon:434564 pagetables:31865
Node 0 DMA free:2268kB min:28kB low:32kB high:40kB active:0kB inactive:0kB
present:10840kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2004 2004 2004
Node 0 DMA32 free:5552kB min:5712kB low:7140kB high:8568kB active:553520kB
inactive:1094012kB present:2052100kB pages_scanned:103025176 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB 4*8kB 3*16kB 4*32kB 2*64kB 3*128kB 2*256kB 0*512kB 1*1024kB
0*2048kB 0*4096kB = 2268kB
Node 0 DMA32: 16*4kB 4*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB
0*2048kB 1*4096kB = 5552kB
Node 0 Normal: empty
Node 0 HighMem: empty
Swap cache: add 553538, delete 553284, find 4842/9360, race 4+8
Free swap  = 0kB
Total swap = 2031608kB
Out of memory: Killed process 3603 (sh).
irqbalance invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0

Call Trace:
 [<ffffffff800bed05>] out_of_memory+0x8e/0x2f5
 [<ffffffff8000f071>] __alloc_pages+0x22b/0x2b4
 [<ffffffff8003bba6>] __get_free_pages+0xe/0x71
 [<ffffffff8011de2b>] selinux_proc_get_sid+0x27/0xbc
 [<ffffffff8011e29b>] inode_doinit_with_dentry+0x3db/0x47c
 [<ffffffff8002ce2a>] wake_up_bit+0x11/0x22
 [<ffffffff800273f0>] proc_lookup+0x85/0xcc
 [<ffffffff80053f64>] proc_root_lookup+0x12/0x30
 [<ffffffff8000ca32>] do_lookup+0xce/0x1c3
 [<ffffffff80009ee9>] __link_path_walk+0xa01/0xf42
 [<ffffffff8000e5b9>] link_path_walk+0x5c/0xe5
 [<ffffffff8000c7e8>] do_path_lookup+0x270/0x2e8
 [<ffffffff800232cf>] __path_lookup_intent_open+0x56/0x97
 [<ffffffff8001a778>] open_namei+0x83/0x6fd
 [<ffffffff80064a9d>] do_page_fault+0x4eb/0x81d
 [<ffffffff800270ce>] do_filp_open+0x1c/0x38
 [<ffffffff80019523>] do_sys_open+0x44/0xbe
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0

Node 0 DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
cpu 4 hot: high 0, batch 1 used:0
cpu 4 cold: high 0, batch 1 used:0
cpu 5 hot: high 0, batch 1 used:0
cpu 5 cold: high 0, batch 1 used:0
cpu 6 hot: high 0, batch 1 used:0
cpu 6 cold: high 0, batch 1 used:0
cpu 7 hot: high 0, batch 1 used:0
cpu 7 cold: high 0, batch 1 used:0
Node 0 DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:24
cpu 0 cold: high 62, batch 15 used:57
cpu 1 hot: high 186, batch 31 used:30
cpu 1 cold: high 62, batch 15 used:30
cpu 2 hot: high 186, batch 31 used:179
cpu 2 cold: high 62, batch 15 used:49
cpu 3 hot: high 186, batch 31 used:27
cpu 3 cold: high 62, batch 15 used:51
cpu 4 hot: high 186, batch 31 used:152
cpu 4 cold: high 62, batch 15 used:60
cpu 5 hot: high 186, batch 31 used:22
cpu 5 cold: high 62, batch 15 used:54
cpu 6 hot: high 186, batch 31 used:68
cpu 6 cold: high 62, batch 15 used:59
cpu 7 hot: high 186, batch 31 used:8
cpu 7 cold: high 62, batch 15 used:31
Node 0 Normal per-cpu: empty
Node 0 HighMem per-cpu: empty
Free pages:        7880kB (0kB HighMem)
Active:235506 inactive:175941 dirty:0 writeback:0 unstable:0 free:1970
slab:15165 mapped-file:972 mapped-anon:434509 pagetables:31850
Node 0 DMA free:2268kB min:28kB low:32kB high:40kB active:0kB inactive:0kB
present:10840kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2004 2004 2004
Node 0 DMA32 free:5612kB min:5712kB low:7140kB high:8568kB active:958876kB
inactive:687004kB present:2052100kB pages_scanned:6581160 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB 4*8kB 3*16kB 4*32kB 2*64kB 3*128kB 2*256kB 0*512kB 1*1024kB
0*2048kB 0*4096kB = 2268kB
Node 0 DMA32: 31*4kB 4*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB
0*2048kB 1*4096kB = 5612kB
Node 0 Normal: empty
Node 0 HighMem: empty
Swap cache: add 553593, delete 553335, find 4842/9360, race 4+8
Free swap  = 0kB
Total swap = 2031608kB
Out of memory: Killed process 3609 (sh).


Expected results: Compile should finish.


Additional info:  hp-dl380g5-01.rhts is available in RHTS.  I have the system
booked ATM and will attempt to do some additional debug.

Comment 1 Prarit Bhargava 2007-11-28 14:54:31 UTC
Please note that I can reproduce this 100% of the time.

Comment 3 Larry Woodman 2007-11-28 15:30:08 UTC
Prarit, please get a /proc/meminfo output when the OOM kill happens.  Evidently
the cciss driver is consuming all of the RAM in buffermem ???

Larry


Comment 4 Prarit Bhargava 2007-11-28 15:53:07 UTC
Before test starts:

[root@hp-dl380g5-01 ~]# cat /proc/meminfo 
MemTotal:      2059464 kB
MemFree:       1805332 kB
Buffers:         19328 kB
Cached:         137344 kB
SwapCached:          0 kB
Active:          73616 kB
Inactive:       129708 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      2059464 kB
LowFree:       1805332 kB
SwapTotal:     2031608 kB
SwapFree:      2031608 kB
Dirty:              88 kB
Writeback:           0 kB
AnonPages:       46668 kB
Mapped:          12136 kB
Slab:            25472 kB
PageTables:       4084 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   3061340 kB
Committed_AS:   106400 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      1812 kB
VmallocChunk: 34359735471 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB


Last output before OOM-kill:

MemTotal:      2059464 kB
MemFree:          9712 kB
Buffers:          7948 kB
Cached:        1803928 kB
SwapCached:         28 kB
Active:         139684 kB
Inactive:      1792148 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      2059464 kB
LowFree:          9712 kB
SwapTotal:     2031608 kB
SwapFree:      2031500 kB
Dirty:          820504 kB
Writeback:           0 kB
AnonPages:      120052 kB
Mapped:          14172 kB
Slab:            79672 kB
PageTables:      12120 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   3061340 kB
Committed_AS:   224708 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      1812 kB
VmallocChunk: 34359735471 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB


Comment 7 Prarit Bhargava 2007-11-28 16:32:52 UTC
I can reproduce this on nec-em8.rhts.boston.redhat.com .

P.

Comment 9 Prarit Bhargava 2007-12-18 19:59:56 UTC
Oh Geez.  I completely forgot that I left this open.

Sorry everyone.

Here's the issue in nutshell.  I was doing a 

'make -j' on a kernel which launches "lots" of parallel processes.  This system
has (IIRC) 8 (or 16?) cpus with 2G of memory which quickly leads to memory
exhaustion.

Bottom line, as lwoodman pointed out, is that I should end up OOM-killing in
this case as I've purposely exhausted memory and eventually swap-space.

This is NOTABUG.

Sorry lwang -- I'll try to be more careful about closing bugs out in the future.
 I completely forgot about it...

P.

Comment 10 Linda Wang 2007-12-18 20:20:16 UTC
:) thanks for letting us know :)


Note You need to log in before you can comment on or make changes to this bug.