134354 – Oops mounting lock_dlm filesystem

Bug 134354 - Oops mounting lock_dlm filesystem

Summary: Oops mounting lock_dlm filesystem

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	David Teigland
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	134530 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-10-01 16:01 UTC by Derek Anderson
Modified:	2010-01-12 02:59 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-10-04 19:17:15 UTC
Embargoed:

Attachments	(Terms of Use)

Description Derek Anderson 2004-10-01 16:01:52 UTC

Description of problem:
Had a quorate 3-node cluster, made a filesystem with -p lock_dlm on a
device (no lvm involved) and mounted on the first node.  It Oopsed. 
Will reproduce and try to get more information:

Unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
e041419d
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: gfs loop lock_dlm dlm cman lock_harness ipv6
parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod
uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<e041419d>]    Not tainted
EFLAGS: 00010292   (2.6.8.1)
EIP is at queue_complete+0xd/0x100 [lock_dlm]
eax: 00000000   ebx: 00000000   ecx: e0415050   edx: 00000001
esi: daf3d3d8   edi: df48ae00   ebp: 00000005   esp: dad35f1c
ds: 007b   es: 007b   ss: 0068
Process dlm_astd (pid: 5314, threadinfo=dad34000 task=db14a850)
Stack: 00000000 e02e008f 00000018 00000246 398bcf80 000f4456 db14aa04
00000296
       daa3350c daa3350c daf3d3d8 e02cd448 00000246 398bcf80 000f4456
dad44330
       c13f7ca0 df48ae98 00000000 e0415060 e0415050 dad35fa4 e02ead8c
dad34000
Call Trace:
 [<e02e008f>] _release_rsb+0x13f/0x2b0 [dlm]
 [<e02cd448>] process_asts+0x108/0x1e0 [dlm]
 [<e0415060>] lock_bast+0x0/0x5 [lock_dlm]
 [<e0415050>] lock_ast+0x0/0x10 [lock_dlm]
 [<e02cdca0>] dlm_astd+0x0/0x220 [dlm]
 [<e02cde95>] dlm_astd+0x1f5/0x220 [dlm]
 [<c011efb0>] default_wake_function+0x0/0x10
 [<c011efb0>] default_wake_function+0x0/0x10
 [<c0134fa4>] kthread+0xa4/0xb0
 [<c0134f00>] kthread+0x0/0xb0
 [<c01042b5>] kernel_thread_helper+0x5/0x10
Code: 8b 30 8b 40 34 a9 80 00 00 00 75 47 c7 04 24 34 8d 41 e0 8b

Filesystem make command:
[root@link-10 root]# gfs_mkfs -t MILTON:data1 -j 2 -p lock_dlm /dev/sdg15

Version-Release number of selected component (if applicable):
[root@link-11 root]# gfs_mkfs -V
gfs_mkfs DEVEL.1096560308 (built Sep 30 2004 11:06:29)
Copyright (C) Red Hat, Inc.  2004  All rights reserved.

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Derek Anderson 2004-10-01 16:38:19 UTC

A little more context on this.  It looks like this is happening during
"Trying to acquire journal lock..." on the last journal.  This is very
reproducible today.  I can't get a filesystem to not do this.  Here is
another example of a GFS with 3 journals:

dlm: data3: total nodes 1
dlm: data3: rebuild resource directory
dlm: data3: rebuilt 0 resources
dlm: data3: recover event 2 done
dlm: data3: recover event 2 finished
GFS: fsid=MILTON:data3.0: Joined cluster. Now mounting FS...
GFS: fsid=MILTON:data3.0: jid=0: Trying to acquire journal lock...
GFS: fsid=MILTON:data3.0: jid=0: Looking at journal...
GFS: fsid=MILTON:data3.0: jid=0: Done
GFS: fsid=MILTON:data3.0: jid=1: Trying to acquire journal lock...
GFS: fsid=MILTON:data3.0: jid=1: Looking at journal...
GFS: fsid=MILTON:data3.0: jid=1: Done
GFS: fsid=MILTON:data3.0: jid=2: Trying to acquire journal lock...
Unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
e043419d
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: lock_dlm gfs lock_harness dlm cman ipv6 parport_pc
lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd
ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<e043419d>]    Not tainted
EFLAGS: 00010292   (2.6.8.1)
EIP is at queue_complete+0xd/0x100 [lock_dlm]
eax: 00000000   ebx: 00000000   ecx: e0435050   edx: 00000001
esi: d93233d8   edi: df476800   ebp: 00000005   esp: d9635f1c
ds: 007b   es: 007b   ss: 0068
Process dlm_astd (pid: 4991, threadinfo=d9634000 task=da38d6b0)
Stack: 00000000 e02e008f 00000018 00000246 6f00a500 000f422b da38d864
00000296
       d9e7d50c d9e7d50c d93233d8 e02cd448 00000086 6f00a500 000f422b
d8d21730
       c13f7ca0 df476898 00000000 e0435060 e0435050 d9635fa4 e02ead8c
d9634000
Call Trace:
 [<e02e008f>] _release_rsb+0x13f/0x2b0 [dlm]
 [<e02cd448>] process_asts+0x108/0x1e0 [dlm]
 [<e0435060>] lock_bast+0x0/0x5 [lock_dlm]
 [<e0435050>] lock_ast+0x0/0x10 [lock_dlm]
 [<e02cdca0>] dlm_astd+0x0/0x220 [dlm]
 [<e02cde95>] dlm_astd+0x1f5/0x220 [dlm]
 [<c011efb0>] default_wake_function+0x0/0x10
 [<c011efb0>] default_wake_function+0x0/0x10
 [<c0134fa4>] kthread+0xa4/0xb0
 [<c0134f00>] kthread+0x0/0xb0
 [<c01042b5>] kernel_thread_helper+0x5/0x10
Code: 8b 30 8b 40 34 a9 80 00 00 00 75 47 c7 04 24 34 8d 43 e0 8b

Comment 2 Derek Anderson 2004-10-04 15:08:55 UTC

*** Bug 134530 has been marked as a duplicate of this bug. ***

Comment 3 David Teigland 2004-10-04 15:28:37 UTC

A couple days ago I checked in a change to the dlm (making NULL a
valid ast arg) but missed checking in the corresponding update to
lock_dlm -- I caught that today.  I think this is what you're 
getting which means you need to update from cvs.

Changes by:     teigland 2004-10-04 05:24:51

Modified files:
        gfs-kernel/src/dlm: lock.c

Log message:
        we must provide the correct astarg to dlm_unlock now that 
        NULL is valid

Comment 4 Derek Anderson 2004-10-04 19:17:15 UTC

Verified against: 
cman_tool DEVEL.1096898839 (built Oct  4 2004 09:08:29) 
Copyright (C) Red Hat, Inc.  2004  All rights reserved.

Comment 5 Kiersten (Kerri) Anderson 2004-11-16 19:09:33 UTC

Updating version to the right level in the defects.  Sorry for the storm.

Note You need to log in before you can comment on or make changes to this bug.