231987 – kernel: oom-killer: gfp_mask=0xd0

Bug 231987 - kernel: oom-killer: gfp_mask=0xd0

Summary: kernel: oom-killer: gfp_mask=0xd0

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Larry Woodman
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-03-13 14:00 UTC by Pawel Sadowski
Modified:	2007-11-17 01:14 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-07-09 17:49:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Pawel Sadowski 2007-03-13 14:00:53 UTC

Description of problem:

We've running RH4U4 on foure our servers. Two of the them were rebooted
by cluster software because OOM killer kill one of cluster process.
We are running MySQL Cluster on this server and few other application,
systems are sometimes under heavy load. But when the OOM starts to kill
processess servers weren't under heavy load.

Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux ES release 4 (Nahant Update 4)


How reproducible:

  Random, we cant reproduce it when we want.
  

Actual results:

  Randomly OOM kills

Expected results:

  No OOM kills


Additional info:
xxxxxxxx:user:~> uname -a
Linux xxxxxxxx 2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25 17:28:02 EDT 2006 i686 i686
i386 GNU/Linux

Output from logs:
Mar  4 11:26:51 xxxxxxxx kernel: oom-killer: gfp_mask=0xd0
Mar  4 11:26:51 xxxxxxxx kernel: Mem-info:
Mar  4 11:26:51 xxxxxxxx kernel: DMA per-cpu:
Mar  4 11:26:51 xxxxxxxx kernel: cpu 0 hot: low 2, high 6, batch 1
Mar  4 11:26:51 xxxxxxxx kernel: cpu 0 cold: low 0, high 2, batch 1
Mar  4 11:26:51 xxxxxxxx kernel: cpu 1 hot: low 2, high 6, batch 1
Mar  4 11:26:51 xxxxxxxx kernel: cpu 1 cold: low 0, high 2, batch 1
Mar  4 11:26:51 xxxxxxxx kernel: Normal per-cpu:
Mar  4 11:26:51 xxxxxxxx kernel: cpu 0 hot: low 32, high 96, batch 16
Mar  4 11:26:51 xxxxxxxx kernel: cpu 0 cold: low 0, high 32, batch 16
Mar  4 11:26:51 xxxxxxxx kernel: cpu 1 hot: low 32, high 96, batch 16
Mar  4 11:26:51 xxxxxxxx kernel: cpu 1 cold: low 0, high 32, batch 16
Mar  4 11:26:51 xxxxxxxx kernel: HighMem per-cpu:
Mar  4 11:26:52 xxxxxxxx kernel: cpu 0 hot: low 32, high 96, batch 16
Mar  4 11:26:52 xxxxxxxx kernel: cpu 0 cold: low 0, high 32, batch 16
Mar  4 11:26:53 xxxxxxxx kernel: cpu 1 hot: low 32, high 96, batch 16
Mar  4 11:26:53 xxxxxxxx kernel: cpu 1 cold: low 0, high 32, batch 16
Mar  4 11:26:53 xxxxxxxx kernel:
Mar  4 11:26:53 xxxxxxxx kernel: Free pages:     2665012kB (2651520kB HighMem)
Mar  4 11:26:54 xxxxxxxx kernel: Active:136947 inactive:742 dirty:0
writeback:0 unstable:0 free:666253 slab:212160 mapped:72062 pagetables:1053
Mar  4 11:26:54 xxxxxxxx clurgmgrd: [9312]: <info> Executing
/opt/pro/commdb-ndbd1.init status
Mar  4 11:26:54 xxxxxxxx kernel: DMA free:12564kB min:16kB low:32kB high:48kB
active:0kB inactive:0kB present:16384kB pages_scanned:8411 all_unreclaimable? yes
Mar  4 11:26:55 xxxxxxxx kernel: protections[]: 0 0 0
Mar  4 11:26:55 xxxxxxxx clurgmgrd[9312]: <notice> status on script
"commdb-ndbd1" returned 2 (invalid argument(s))
Mar  4 11:26:55 xxxxxxxx kernel: Normal free:928kB min:928kB low:1856kB
high:2784kB active:416kB inactive:252kB present:901120kB pages_scanned:1188
all_unreclaimable? yes
Mar  4 11:26:55 xxxxxxxx clurgmgrd: [9312]: <info> Executing
/opt/pro/commdb-mgmd.init status
Mar  4 11:26:55 xxxxxxxx kernel: protections[]: 0 0 0
Mar  4 11:26:56 xxxxxxxx kernel: HighMem free:2651520kB min:512kB low:1024kB
high:1536kB active:547372kB inactive:2716kB present:3735548kB pages_scanned:0
all_unreclaimable? no
Mar  4 11:26:56 xxxxxxxx kernel: protections[]: 0 0 0
Mar  4 11:26:56 xxxxxxxx kernel: DMA: 5*4kB 4*8kB 4*16kB 3*32kB 3*64kB 1*128kB
1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12564kB
Mar  4 11:26:57 xxxxxxxx kernel: Normal: 10*4kB 3*8kB 0*16kB 3*32kB 0*64kB
0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 928kB
Mar  4 11:26:57 xxxxxxxx clurgmgrd: [9312]: <info> Executing
/opt/pro/smtpout1.init status
Mar  4 11:26:57 xxxxxxxx kernel: HighMem: 17938*4kB 16983*8kB 15212*16kB
9058*32kB 3920*64kB 981*128kB 491*256kB 387*512kB 342*1024kB 150*2048kB
135*4096kB = 2651520kB
Mar  4 11:26:57 xxxxxxxx kernel: Swap cache: add 1, delete 1, find 0/0, race 0+0
Mar  4 11:26:57 xxxxxxxx clurgmgrd[9312]: <notice> Stopping service commdb-ndbd1
Mar  4 11:26:57 xxxxxxxx kernel: 0 bounce buffer pages
Mar  4 11:26:57 xxxxxxxx kernel: Free swap:       2003252kB
Mar  4 11:26:58 xxxxxxxx kernel: 1163263 pages of RAM
Mar  4 11:26:58 xxxxxxxx kernel: 802802 pages of HIGHMEM
Mar  4 11:26:58 xxxxxxxx kernel: 141733 reserved pages
Mar  4 11:26:58 xxxxxxxx kernel: 64402 pages shared
Mar  4 11:26:59 xxxxxxxx kernel: 0 pages swap cached
Mar  4 11:26:59 xxxxxxxx kernel: Out of Memory: Killed process 17287 (ndbd).


Please let me know if you need more info

Comment 1 Larry Woodman 2007-06-21 15:18:22 UTC

The problem here is this is a 32-bit x86 system and all of Lowmem is consumed in
the slabcache:

slab:212160

Normal free:928kB min:928kB low:1856kB high:2784kB active:416kB inactive:252kB
present:901120kB


Please ge a /proc/slabinfo output when the OOM kill happens so we can see who is
consuming all of this memory.

Larry Woodman

Comment 2 Pawel Sadowski 2007-07-02 07:41:12 UTC

Our problem was caused by memory leak from cman. So this bug should be closed.

Thanks for your time.

Comment 3 Pawel Sadowski 2007-07-02 07:51:51 UTC

Problem and resolution is described in bug #212634. rgmanager consume to much
memory and cause oom-killer to start killing.

Note You need to log in before you can comment on or make changes to this bug.