Bug 164370 - oom killer kills processes although there is enough free swap available
Summary: oom killer kills processes although there is enough free swap available
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-07-27 12:32 UTC by Hannes Kuehnemund
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: RHEL4-U2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-21 18:31:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Hannes Kuehnemund 2005-07-27 12:32:31 UTC
Description of problem:
While creating devspaces for MaxDB 7.5.0 oom kills MaxDB processes/threads after
30-50GB were written to disk. Enabling USE_OPEN_DIRECT on MaxDB works fine,
although after 20GB the writing speed drops from 70MB/sec to 2MB/sec.


Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-11.EL


How reproducible:
Everytime

Steps to Reproduce:
1. Download MaxDB 7.5 from http://dev.mysql.com/downloads/maxdb/7.5.00.html
2. Install software wich SDBINST
3. Modify create_demo_db.sh script: replace "param_addvolume 1 DATA DISKD0001 F
2560" with "param_addvolume 1 DATA DISKD0001 F 8544921" (which is around 70GB in
8k blocks).
4. start the script

  
Actual results:
All MaxDB processes/threads were killed by oom


Expected results:
oom doing nothing...

Additional info:
Hardware information:

Dell PowerEdge 6650
4 x 3Ghz Xeon HT
RAID bus controller: LSI Logic / Symbios Logic MegaRAID (rev 01)
external SCSI enclosure with 12x36GB disks (RAID 5)

Software information:

# lsmod
Module                  Size  Used by
nfs                   200869  1
lockd                  65257  2 nfs
vfat                   16961  0
fat                    44129  1 vfat
md5                     8001  1
ipv6                  238817  28
autofs4                22085  0
i2c_dev                14273  0
i2c_core               25921  1 i2c_dev
sunrpc                138789  4 nfs,lockd
dm_mod                 58949  0
button                 10449  0
battery                12869  0
ac                      8773  0
tg3                    82373  0
floppy                 58065  0
sg                     38113  0
ext3                  118729  3
jbd                    59481  1 ext3
qla2300               126529  0
qla2xxx               109665  1 qla2300
scsi_transport_fc      11712  1 qla2xxx
megaraid_mbox          38097  4
megaraid_mm            17905  1 megaraid_mbox
aic7xxx               146553  0
sd_mod                 20545  6
scsi_mod              116429  6
sg,qla2xxx,scsi_transport_fc,megaraid_mbox,aic7xxx,sd_mod

#dmesg
oom-killer: gfp_mask=0xd0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
cpu 1 hot: low 2, high 6, batch 1
cpu 1 cold: low 0, high 2, batch 1
cpu 2 hot: low 2, high 6, batch 1
cpu 2 cold: low 0, high 2, batch 1
cpu 3 hot: low 2, high 6, batch 1
cpu 3 cold: low 0, high 2, batch 1
cpu 4 hot: low 2, high 6, batch 1
cpu 4 cold: low 0, high 2, batch 1
cpu 5 hot: low 2, high 6, batch 1
cpu 5 cold: low 0, high 2, batch 1
cpu 6 hot: low 2, high 6, batch 1
cpu 6 cold: low 0, high 2, batch 1
cpu 7 hot: low 2, high 6, batch 1
cpu 7 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
cpu 4 hot: low 32, high 96, batch 16
cpu 4 cold: low 0, high 32, batch 16
cpu 5 hot: low 32, high 96, batch 16
cpu 5 cold: low 0, high 32, batch 16
cpu 6 hot: low 32, high 96, batch 16
cpu 6 cold: low 0, high 32, batch 16
cpu 7 hot: low 32, high 96, batch 16
cpu 7 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16
cpu 2 cold: low 0, high 32, batch 16
cpu 3 hot: low 32, high 96, batch 16
cpu 3 cold: low 0, high 32, batch 16
cpu 4 hot: low 32, high 96, batch 16
cpu 4 cold: low 0, high 32, batch 16
cpu 5 hot: low 32, high 96, batch 16
cpu 5 cold: low 0, high 32, batch 16
cpu 6 hot: low 32, high 96, batch 16
cpu 6 cold: low 0, high 32, batch 16
cpu 7 hot: low 32, high 96, batch 16
cpu 7 cold: low 0, high 32, batch 16

Free pages:       15136kB (1600kB HighMem)
Active:285350 inactive:1668227 dirty:219220 writeback:113476 unstable:0
free:3784 slab:49928 mapped:283285 pagetables:1789
DMA free:12640kB min:16kB low:32kB high:48kB active:0kB inactive:0kB
present:16384kB pages_scanned:20983 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:896kB min:928kB low:1856kB high:2784kB active:8252kB
inactive:615108kB present:901120kB pages_scanned:817047 all_unreclaimable? yes
protections[]: 0 0 0
HighMem free:1600kB min:512kB low:1024kB high:1536kB active:1133148kB
inactive:6057800kB present:7471104kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 4*4kB 4*8kB 3*16kB 4*32kB 4*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB
2*4096kB = 12640kB
Normal: 0*4kB 104*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 896kB
HighMem: 98*4kB 39*8kB 22*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB
0*2048kB 0*4096kB = 1600kB
Swap cache: add 21638, delete 21560, find 413/618, race 0+0
Free swap:       10190220kB
2097152 pages of RAM
1802208 pages of HIGHMEM
83433 reserved pages
1695143 pages shared
78 pages swap cached
Out of Memory: Killed process 3505 (kernel).
Out of Memory: Killed process 3506 (kernel).
Out of Memory: Killed process 3507 (kernel).
Out of Memory: Killed process 3508 (kernel).
Out of Memory: Killed process 3509 (kernel).
Out of Memory: Killed process 3510 (kernel).
Out of Memory: Killed process 3511 (kernel).
Out of Memory: Killed process 3512 (kernel).
Out of Memory: Killed process 3513 (kernel).
Out of Memory: Killed process 3514 (kernel).
Out of Memory: Killed process 3515 (kernel).
Out of Memory: Killed process 3516 (kernel).
Out of Memory: Killed process 3517 (kernel).
Out of Memory: Killed process 3518 (kernel).
Out of Memory: Killed process 3519 (kernel).
Out of Memory: Killed process 3520 (kernel).
Out of Memory: Killed process 3521 (kernel).
Out of Memory: Killed process 3522 (kernel).
Out of Memory: Killed process 3523 (kernel).
Out of Memory: Killed process 3524 (kernel).
Out of Memory: Killed process 3525 (kernel).
Out of Memory: Killed process 3526 (kernel).
Out of Memory: Killed process 3527 (kernel).
Out of Memory: Killed process 3528 (kernel).
Out of Memory: Killed process 3535 (kernel).
Out of Memory: Killed process 3536 (kernel).
Out of Memory: Killed process 3539 (kernel).
Out of Memory: Killed process 3540 (kernel).
Out of Memory: Killed process 3737 (kernel).
Out of Memory: Killed process 3738 (kernel).
Fixed up OOM kill of mm-less task

Comment 1 Larry Woodman 2005-08-16 15:42:31 UTC
Hannes, please try the latest RHEL4-U2 beta kernel and see if this problem is
solved.  We have made several changes to improve the OOM kill problems as well
as adding more debug output when it does happen so we can beter determine
exactly why this is happening.

Thanks, Larry Woodman


Comment 2 Hannes Kuehnemund 2005-08-24 11:34:23 UTC
Hi Larry,

kernel-smp-2.6.9-15.EL works fine. No kills of oom killer anymore.

Cheers
Hannes


Note You need to log in before you can comment on or make changes to this bug.