236930 – Kernel panic (Unable to handle kernel NULL pointer dereference)

Bug 236930 - Kernel panic (Unable to handle kernel NULL pointer dereference)

Summary: Kernel panic (Unable to handle kernel NULL pointer dereference)

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	4
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Robert Peterson
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-04-18 14:41 UTC by Brian Pontz
Modified:	2010-01-12 03:15 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-08-15 19:08:27 UTC
Embargoed:

Attachments	(Terms of Use)

Description Brian Pontz 2007-04-18 14:41:19 UTC

Description of problem:

A couple of times a month we get a kernel panic 
Unable to handle kernel NULL pointer dereference at virtual address 00000000
We have 2 nodes and this only happens on one of them. The node it happens on
runs ypserv and I notice in the kernel panic it always mentions ypserv. The
other node does not run this service.


Version-Release number of selected component (if applicable):
CentOS 4.4 
Linux scylla1 2.6.9-42.0.3.ELsmp #1 SMP Fri Oct 6 06:21:39 CDT 2006 i686 i686
i386 GNU/Linux
clustat version 1.9.54
Connected via: CMAN/SM Plugin v1.1.7.1


How reproducible:
I am not able to reproduce this at will.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Apr 13 21:06:04 scylla1 kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000000
Apr 13 21:06:04 scylla1 kernel:  printing eip:
Apr 13 21:06:04 scylla1 kernel: f8c743a6
Apr 13 21:06:04 scylla1 kernel: *pde = 18643001
Apr 13 21:06:04 scylla1 kernel: Oops: 0000 [#1]
Apr 13 21:06:04 scylla1 kernel: SMP
Apr 13 21:06:04 scylla1 kernel: CPU:    0
Apr 13 21:06:04 scylla1 kernel: EIP:    0060:[<f8c743a6>]    Not tainted VLI
Apr 13 21:06:04 scylla1 kernel: EFLAGS: 00010203   (2.6.9-42.0.3.ELsmp)
Apr 13 21:06:04 scylla1 kernel: EIP is at gfs_glock_dq+0xaf/0x16e [gfs]
Apr 13 21:06:04 scylla1 kernel: eax: eaf39524   ebx: eaf39518   ecx: f7f464ff  
edx: 00000000
Apr 13 21:06:04 scylla1 kernel: esi: 00000000   edi: eaf394fc   ebp: f68dd61c  
esp: f67dce98
Apr 13 21:06:04 scylla1 kernel: ds: 007b   es: 007b   ss: 0068
Apr 13 21:06:04 scylla1 kernel: Process ypserv (pid: 5328, threadinfo=f67dc000
task=f27e3630)
Apr 13 21:06:04 scylla1 kernel: Stack: 00117975 de2e939c f8ca96a0 f8945000
f68dd61c f68dd61c f68dd604 f68dd600
Apr 13 21:06:04 scylla1 kernel:        f8c747aa c2b48e80 f8c8945c f67dceec
c2b48e80 00000000 00000007 c2b48e80
Apr 13 21:06:04 scylla1 kernel:        f8c894d0 c2b48e80 f8ca98e0 edb22768
c016e8ac 00000000 00000000 00000000
Apr 13 21:06:04 scylla1 kernel: Call Trace:
Apr 13 21:06:04 scylla1 kernel:  [<f8c747aa>] gfs_glock_dq_uninit+0x8/0x10 [gfs]
Apr 13 21:06:04 scylla1 kernel:  [<f8c8945c>] do_unflock+0x4f/0x61 [gfs]
Apr 13 21:06:04 scylla1 kernel:  [<f8c894d0>] gfs_flock+0x62/0x76 [gfs]
Apr 13 21:06:04 scylla1 kernel:  [<c016e8ac>] locks_remove_flock+0x49/0xe1
Apr 13 21:06:04 scylla1 kernel:  [<c015bbc2>] __fput+0x41/0x100
Apr 13 21:06:04 scylla1 kernel:  [<c015a7f5>] filp_close+0x59/0x5f
Apr 13 21:06:04 scylla1 kernel:  [<c0123b5b>] put_files_struct+0x57/0xc0
Apr 13 21:06:04 scylla1 kernel:  [<c012476f>] do_exit+0x245/0x404
Apr 13 21:06:04 scylla1 kernel:  [<c0124a19>] sys_exit_group+0x0/0xd
Apr 13 21:06:04 scylla1 kernel:  [<c02d47cb>] syscall_call+0x7/0xb
Apr 13 21:06:04 scylla1 kernel:  <0>Fatal exception: panic in 5 seconds

Comment 1 Brian Pontz 2007-04-18 15:18:46 UTC

And the previous panic. I am only adding it because it lists the modules linked in.

Mar 30 04:35:06 scylla1 kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000000
Mar 30 04:35:06 scylla1 kernel:  printing eip:
Mar 30 04:35:06 scylla1 kernel: f8c743a6
Mar 30 04:35:06 scylla1 kernel: *pde = 00004001
Mar 30 04:35:06 scylla1 kernel: Oops: 0000 [#1]
Mar 30 04:35:06 scylla1 kernel: SMP
Mar 30 04:35:06 scylla1 kernel: Modules linked in: nfsd exportfs lockd nfs_acl
parport_pc lp parport autofs4 i2c_dev
 i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) sunrpc dm_mirror
dm_multipath dm_mod button battery ac m
d5 ipv6 uhci_hcd ehci_hcd hw_random tg3 floppy ext3 jbd cciss sd_mod scsi_mod
Mar 30 04:35:06 scylla1 kernel: CPU:    1
Mar 30 04:35:06 scylla1 kernel: EIP:    0060:[<f8c743a6>]    Not tainted VLI
Mar 30 04:35:06 scylla1 kernel: EFLAGS: 00010207   (2.6.9-42.0.3.ELsmp)
Mar 30 04:35:06 scylla1 kernel: EIP is at gfs_glock_dq+0xaf/0x16e [gfs]
Mar 30 04:35:06 scylla1 kernel: eax: ebb81a84   ebx: ebb81a78   ecx: f7f46400  
edx: 00000000
Mar 30 04:35:06 scylla1 kernel: esi: 00000000   edi: ebb81a5c   ebp: ce9a251c  
esp: e17f9e98
Mar 30 04:35:06 scylla1 kernel: ds: 007b   es: 007b   ss: 0068
Mar 30 04:35:06 scylla1 kernel: Process ypserv (pid: 9039, threadinfo=e17f9000
task=f32f7330)
Mar 30 04:35:06 scylla1 kernel: Stack: 0000630a e314889c f8ca96a0 f8945000
ce9a251c ce9a251c ce9a2504 ce9a2500
Mar 30 04:35:06 scylla1 kernel:        f8c747aa ef2ed980 f8c8945c e17f9eec
ef2ed980 00000000 00000007 ef2ed980
Mar 30 04:35:06 scylla1 kernel:        f8c894d0 ef2ed980 f8ca98e0 ec21b208
c016e8ac 00000000 00000000 00000000
Mar 30 04:35:06 scylla1 kernel: Call Trace:
Mar 30 04:35:06 scylla1 kernel:  [<f8c747aa>] gfs_glock_dq_uninit+0x8/0x10 [gfs]
Mar 30 04:35:06 scylla1 kernel:  [<f8c8945c>] do_unflock+0x4f/0x61 [gfs]
Mar 30 04:35:06 scylla1 kernel:  [<f8c894d0>] gfs_flock+0x62/0x76 [gfs]
Mar 30 04:35:06 scylla1 kernel:  [<c016e8ac>] locks_remove_flock+0x49/0xe1
Mar 30 04:35:06 scylla1 kernel:  [<c015bbc2>] __fput+0x41/0x100
Mar 30 04:35:06 scylla1 kernel:  [<c015a7f5>] filp_close+0x59/0x5f
Mar 30 04:35:06 scylla1 kernel:  [<c0123b5b>] put_files_struct+0x57/0xc0
Mar 30 04:35:06 scylla1 kernel:  [<c012476f>] do_exit+0x245/0x404
Mar 30 04:35:06 scylla1 kernel:  [<c0124a19>] sys_exit_group+0x0/0xd
Mar 30 04:35:06 scylla1 kernel:  [<c02d47cb>] syscall_call+0x7/0xb
Mar 30 04:35:06 scylla1 kernel: Code: f8 ba 57 85 c9 f8 68 2d 82 c9 f8 8b 44 24
14 e8 e0 1e 02 00 59 5b f6 45 15 08
74 06 f0 0f ba 6f 08 04 f6 45 15 04 74 38 8b 57 28 <8b> 02 0f 18 00 90 8d 47 28
39 c2 74 0b ff 04 24 89 54 24 04 8b
Mar 30 04:35:06 scylla1 kernel:  <0>Fatal exception: panic in 5 seconds

Comment 2 Kiersten (Kerri) Anderson 2007-04-23 17:41:10 UTC

Fixing product name. Cluster Suite components were integrated into Enterprise
Linux for verion 5.0.

Comment 3 Robert Peterson 2007-04-24 14:26:01 UTC

The bug said version 5, but the kernel said CentOS 4.4, with a 
2.6.9-42 kernel.  Therefore, I'm changing version and setting it 
back to cluster-suite and gfs-kernel.

Comment 5 Robert Peterson 2007-05-03 20:06:30 UTC

I tried recreating the problem with a variety of programs that take
flocks and exit with them held.  It didn't recreate.
I also dug through the code and didn't find anything obvious relating
to this code path.

Comment 6 Brian Pontz 2007-05-04 14:25:15 UTC

Ok. We havent had this issue again since reporting it. Though it did happen a
few times before I finally reported it. Did you need any other info about the
machine or anything else?

Comment 7 Robert Peterson 2007-05-07 14:27:56 UTC

I don't need anything at the moment, short of a good way to recreate the
problem.  It appears that the process is exiting while still holding
flock(s).  I've tried many variations on that, but didn't recreate this
problem.  Abhi Das was the last person to work on the flock code and
he offered to investigate it some more, so I'm adding him to the cc list.

Comment 8 Brian Pontz 2007-08-14 17:39:18 UTC

At this point I have nothing else to report. It hasnt happened again since this
report and I'm in the process of updating right now. So, feel free to close this
ticket/bug if you want to and I'll file another one if it happens again.

Comment 9 Robert Peterson 2007-08-15 19:08:27 UTC

I haven't seen this problem again either.  Please feel free to open
up another bug if you see the problem again.

Note You need to log in before you can comment on or make changes to this bug.