234316 – Kernel Panics with sbatchd

Bug 234316 - Kernel Panics with sbatchd

Summary: Kernel Panics with sbatchd

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Red Hat Kernel Manager
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-03-28 13:39 UTC by Florian Krueger
Modified:	2008-04-29 13:18 UTC (History)
CC List:	0 users
Fixed In Version:	-55ish
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-04-29 13:18:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Florian Krueger 2007-03-28 13:39:54 UTC

Description of problem:

Sometimes our Machines are crashing hardly with the following Message. Only
cold-boot helps out.

Mar 28 09:12:03 amun014 kernel: Unable to handle kernel NULL pointer dereference
at 000000000000007c RIP: 
Mar 28 09:12:03 amun014 kernel: <ffffffff8030b199>{__lock_text_start+1}
Mar 28 09:12:03 amun014 kernel: PML4 1e57c3067 PGD 1e57c4067 PMD 0 
Mar 28 09:12:03 amun014 kernel: Oops: 0000 [1] SMP 
Mar 28 09:12:03 amun014 kernel: CPU 0 
Mar 28 09:12:03 amun014 kernel: Modules linked in: nfsd exportfs nfs lockd
nfs_acl md5 ipv6 autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod butt
on battery ac ohci_hcd hw_random tg3 floppy ext3 jbd mptscsih mptsas mptspi
mptfc mptscsi mptbase sd_mod scsi_mod
Mar 28 09:12:03 amun014 kernel: Pid: 8037, comm: sbatchd Not tainted
2.6.9-42.0.8.ELsmp
Mar 28 09:12:03 amun014 kernel: RIP: 0010:[<ffffffff8030b199>]
<ffffffff8030b199>{__lock_text_start+1}
Mar 28 09:12:03 amun014 kernel: RSP: 0018:00000101e5197e38  EFLAGS: 00010246
Mar 28 09:12:03 amun014 kernel: RAX: 0000000000020000 RBX: 00000101cbeef478 RCX:
0000002800000000
Mar 28 09:12:03 amun014 kernel: RDX: ffffffff803dc340 RSI: 00000000ffffe000 RDI:
0000000000000078
Mar 28 09:12:03 amun014 kernel: RBP: 00000000ffffe000 R08: 00000000ffffffff R09:
0000000000000000
Mar 28 09:12:03 amun014 kernel: R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
Mar 28 09:12:03 amun014 kernel: R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
Mar 28 09:12:03 amun014 kernel: FS:  0000002a9557f980(0000)
GS:ffffffff804e5880(0000) knlGS:00000000f7fd46c0
Mar 28 09:12:03 amun014 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 28 09:12:03 amun014 kernel: CR2: 000000000000007c CR3: 0000000000101000 CR4:
00000000000006a0
Mar 28 09:12:03 amun014 kernel: Process sbatchd (pid: 8037, threadinfo
00000101e5196000, task 00000103e0dc77f0)
Mar 28 12:24:07 amun014 syslogd 1.4.1: restart.


Version-Release number of selected component (if applicable):

Linux amun014 2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 23 12:49:51 EST 2007 x86_64
x86_64 x86_64 GNU/Linux


How reproducible: Not really know how to reproduce at the Moment. The
Application seems to be the same, while sbatchd crashes the Machine. Happens
sometimes, not in all Jobs. We run about thousands of simulations per day and on
one or two simulations in 2 days, this panic happens.

  
Actual results:
Panic

Expected results:
No Panic, work as in all other Jobs

Additional info:
If you need more Information, please let me know

Comment 1 Jason Baron 2007-10-24 18:08:45 UTC

is there any more traceback in the logs? any further information or reproducer
to work off of? thanks.

Comment 2 Linda Wang 2007-11-15 23:10:20 UTC

ping, is there any kernel panic stack track back that we can use
to analysis the issue?

Comment 3 Florian Krueger 2007-11-16 07:00:00 UTC

Hi there,

this bug seems to be fixed with one of the last kernel Patches.
Unfortunately I changed my workplace so I don't know exactly which
Release solved it and I saw your Mail at least today.

You may close this ticket.

Florian

Comment 4 Jeff Layton 2008-04-29 13:18:31 UTC

Closing ticket based on comment #3.

Note You need to log in before you can comment on or make changes to this bug.