191445 – multithreaded coredump can trigger kernel panic

Bug 191445 - multithreaded coredump can trigger kernel panic

Summary: multithreaded coredump can trigger kernel panic

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Red Hat Kernel Manager
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-05-12 00:28 UTC by Todd Jimenez
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-09-13 12:38:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Todd Jimenez 2006-05-12 00:28:33 UTC

Description of problem:

attempting to dump core for multithreaded applications can trigger a kernel
panic.  currently, we've seen it both for apache 2.x and an internal app.

we're running as3 update 6, with kernel 2.4.21-37.ELsmp

------------[ cut here ]------------
  kernel BUG at exec.c:1298!
  invalid operand: 0000
  nfs lockd sunrpc tg3 sg keybdev mousedev hid input ehci-hcd usb-uhci 
usbcore ext3 jbd raid1 ata_piix scsi_dump_register libata mptscsih 
mptbase diskdumplib sd
  CPU:    2
  EIP:    0060:[<c0170d49>]    Not tainted
  EFLAGS: 00010202

  EIP is at coredump_wait [kernel] 0x39 (2.4.21-37.ELsmp/i686)
  eax: 000001bf   ebx: de120080   ecx: e7216000   edx: f40dd780
  esi: e7217e68   edi: c03aca80   ebp: e7216000   esp: e7217e64
  ds: 0068   es: 0068   ss: 0068
  Process atlas (pid: 28004, stackpage=e7217000)
  Stack: ffffffff 00000000 00000001 e7217e70 e7217e70 00000475 de120080 
c0170f3a
  de120080 0000000a e72168b4 00000000 c8c2dbf0 f3e95314 f3e95314
c0138e1e
  c8c2dbf0 f3e95314 00000020 0000000b 0000000b e7216000 e72168b4
c013605f
  Call Trace:   [<c0170f3a>] do_coredump [kernel] 0x16a (0xe7217e80)
  [<c0138e1e>] collect_signal [kernel] 0xae (0xe7217ea0)
  [<c013605f>] __dequeue_signal [kernel] 0x6f (0xe7217ec0)
  [<c01360c4>] dequeue_signal [kernel] 0x34 (0xe7217edc)
  [<c013767c>] get_signal_to_deliver [kernel] 0x20c (0xe7217ef8)
  [<c010bf84>] do_signal [kernel] 0x64 (0xe7217f20)
  [<c013c908>] do_futex [kernel] 0xf8 (0xe7217f58)
  [<c013c9c9>] sys_futex [kernel] 0xb9 (0xe7217f88)
  [<c011ff60>] do_page_fault [kernel] 0x0 (0xe7217fbc)

  Code: 0f 0b 12 05 85 03 2c c0 89 b3 18 01 00 00 40 89 83 14 01 00

  Kernel panic: Fatal exception


Version-Release number of selected component (if applicable):


How reproducible:

currently eratic.  I'm guessing we're triggering a race condition.  I would
guess that it could be reproduced by killing any sufficiently threaded app.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Ernie Petrides 2006-05-12 00:48:58 UTC

Hi, Todd.  The fix that went into U7 for bug 168392 might avoid the
problem you're seeing.  Please try upgrading to U7 (released a couple
of months ago) to see if that fix resolves this problem.

Unfortunately, RHEL3 is now closed (and U8 is currently in beta).

Comment 2 Prarit Bhargava 2007-09-13 12:38:22 UTC

Closing re: comment #1.

P.

Note You need to log in before you can comment on or make changes to this bug.