132772 – GFS OOPSes when a lock module returns an error on mount

Bug 132772 - GFS OOPSes when a lock module returns an error on mount

Summary: GFS OOPSes when a lock module returns an error on mount

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	michael conrad tadpol tilstra
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-09-16 20:26 UTC by Corey Marthaler
Modified:	2010-01-12 02:58 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-10-27 21:23:02 UTC
Embargoed:

Attachments	(Terms of Use)

Description Corey Marthaler 2004-09-16 20:26:40 UTC

Description of problem: 
The name of my cluster is 'morph-cluster'. 
 
I started services on all in cluster: 
'lock_gulmd -s morph-01,morph-03,morph-05 -n morph-cluster' 
 
I created an fs with a different cluster name "foobar". 
 
Attemtped to mount: 
mount -t gfs /dev/clvmdtest/lvol0 /mnt/clvmdtest0  
 
lock_gulm: ERROR Core returned error 1003:Bad Cluster ID. 
lock_gulm: ERROR cm_login failed. 1003 
lock_gulm: ERROR Got a 1003 trying to start the threads. 
lock_gulm: fsid=foobarcluster:clvmdtest0: Exiting gulm_mount with 
errors 1003 
GFS: can't mount proto = lock_gulm, table = 
foobarcluster:clvmdtest0, hostdata = 
Unable to handle kernel NULL pointer dereference at virtual address 
00000427 
 printing eip: 
c0162b21 
*pde = 00000000 
Oops: 0000 [#1] 
SMP 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac 
ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    1 
EIP:    0060:[<c0162b21>]    Not tainted 
EFLAGS: 00010246   (2.6.8.1) 
EIP is at do_kern_mount+0xb1/0x150 
eax: 00000000   ebx: 00000000   ecx: c034b900   edx: 00000000 
esi: 000003eb   edi: f7f64b00   ebp: f8a65800   esp: f43ddeec 
ds: 007b   es: 007b   ss: 0068 
Process mount (pid: 4350, threadinfo=f43dc000 task=f6cf46d0) 
Stack: 00000000 00000000 f509c000 00000000 f43ba000 00000000 
f43ba003 f509c000 
       c017858f 00000000 00000000 00000000 f43ddf4c c034b900 
00000000 f43ddf4c 
       00000000 c0178c40 00000000 f509c000 00000000 00000000 
f43ba000 f509c000 
Call Trace: 
 [<c017858f>] do_new_mount+0x6f/0xc0 
 [<c0178c40>] do_mount+0x170/0x1b0 
 [<c01c5015>] copy_from_user+0x45/0x80 
 [<c0178a69>] copy_mount_options+0x59/0xc0 
 [<c0179007>] sys_mount+0xa7/0x130 
 [<c0105e4d>] sysenter_past_esp+0x52/0x71 
Code: 8b 56 3c 85 d2 74 09 8b 02 85 c0 74 4a f0 ff 02 89 57 10 8b 
 Sep 16 15:23:09 morph-01 lock_gulmd_LT000[4335]: New Client: idx 9 
fd 14 from morph-06 ::ffff:192.168.44.66 
Sep 16 15:23:09 morph-01 sshd(pam_unix)[4338]: session opened for 
user root by (uid=0) 
Sep 16 15:23:09 morph-01 sshd(pam_unix)[4338]: session closed for 
user root 
Sep 16 15:23:09 morph-01 sshd(pam_unix)[4348]: session opened for 
user root by (uid=0) 
Sep 16 15:23:09 morph-01 lock_gulmd_core[4333]: ERROR 
[src/core_io.c:1204] ::1 claims to be part of foobarcluster, but we 
are morph-cluster 
Sep 16 15:23:09 morph-01 kernel: lock_gulm: ERROR Core returned 
error 1003:Bad Cluster ID. 
Sep 16 15:23:09 morph-01 kernel: lock_gulm: ERROR cm_login failed. 
1003 
Sep 16 15:23:09 morph-01 kernel: lock_gulm: ERROR Got a 1003 trying 
to start the threads. 
Sep 16 15:23:09 morph-01 kernel: lock_gulm: 
fsid=foobarcluster:clvmdtest0: Exiting gulm_mount with errors 1003 
Sep 16 15:23:09 morph-01 kernel: GFS: can't mount proto = lock_gulm, 
table = foobarcluster:clvmdtest0, hostdata = 
Sep 16 15:23:09 morph-01 kernel: Unable to handle kernel NULL 
pointer dereference at virtual address 00000427 
Sep 16 15:23:09 morph-01 kernel:  printing eip: 
Sep 16 15:23:09 morph-01 kernel: c0162b21 
Sep 16 15:23:09 morph-01 kernel: *pde = 00000000 
Sep 16 15:23:09 morph-01 kernel: Oops: 0000 [#1] 
Sep 16 15:23:09 morph-01 kernel: SMP 
Sep 16 15:23:09 morph-01 kernel: Modules linked in: gnbd lock_gulm 
lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp 
parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd 
ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx 
scsi_transport_fc sd_mod scsi_mod 
Sep 16 15:23:09 morph-01 kernel: CPU:    1 
Sep 16 15:23:09 morph-01 kernel: EIP:    0060:[<c0162b21>]    Not 
tainted 
Sep 16 15:23:09 morph-01 kernel: EFLAGS: 00010246   (2.6.8.1) 
Sep 16 15:23:09 morph-01 kernel: EIP is at do_kern_mount+0xb1/0x150 
Sep 16 15:23:09 morph-01 kernel: eax: 00000000   ebx: 00000000   
ecx: c034b900   edx: 00000000 
Sep 16 15:23:09 morph-01 kernel: esi: 000003eb   edi: f7f64b00   
ebp: f8a65800   esp: f43ddeec 
Sep 16 15:23:09 morph-01 sshd(pam_unix)[4348]: session closed for 
user root 
Sep 16 15:23:09 morph-01 kernel: ds: 007b   es: 007b   ss: 0068 
Sep 16 15:23:09 morph-01 kernel: Process mount (pid: 4350, 
threadinfo=f43dc000 task=f6cf46d0) 
Sep 16 15:23:09 morph-01 kernel: Stack: 00000000 00000000 f509c000 
00000000 f43ba000 00000000 f43ba003 f509c000 
Sep 16 15:23:09 morph-01 kernel:        c017858f 00000000 00000000 
00000000 f43ddf4c c034b900 00000000 f43ddf4c 
Sep 16 15:23:09 morph-01 kernel:        00000000 c0178c40 00000000 
f509c000 00000000 00000000 f43ba000 f509c000 
Sep 16 15:23:09 morph-01 kernel: Call Trace: 
Sep 16 15:23:09 morph-01 kernel:  [<c017858f>] 
do_new_mount+0x6f/0xc0 
Sep 16 15:23:09 morph-01 kernel:  [<c0178c40>] do_mount+0x170/0x1b0 
Sep 16 15:23:09 morph-01 kernel:  [<c01c5015>] 
copy_from_user+0x45/0x80 
Sep 16 15:23:09 morph-01 kernel:  [<c0178a69>] 
copy_mount_options+0x59/0xc0 
Sep 16 15:23:09 morph-01 kernel:  [<c0179007>] sys_mount+0xa7/0x130 
Sep 16 15:23:09 morph-01 kernel:  [<c0105e4d>] 
sysenter_past_esp+0x52/0x71 
Sep 16 15:23:09 morph-01 kernel: Code: 8b 56 3c 85 d2 74 09 8b 02 85 
c0 74 4a f0 ff 02 89 57 10 8b 
 
 
How reproducible: 
Always

Comment 1 michael conrad tadpol tilstra 2004-09-23 14:37:58 UTC

the oops isn't happening in gulm.  It seems to be either up in gfs or
perhap up in the vfs after gulm returns an error on mount.  Still, it
does seem the fault of gulm.  I'm making a guess that when gulm
returns a mount error, it leaves many of the other params untouched. 
And perhaps gfs is trying to dereference those.  maybe.  will dig more.

Comment 2 michael conrad tadpol tilstra 2004-09-23 14:57:02 UTC

Not a gulm bug.  Lock modules returning errors to gfs on mount panic.
If you make the first line of the mount function in the nolock module
to return an error, you get the same OOPS.

Comment 3 Ken Preslan 2004-10-12 21:26:59 UTC

Yes, it is a gulm bug.

In Linux error numbers are negative.  lock_gulm is returning a
positive error number (1003) which GFS passes up to the VFS.  The VFS
interprets the positive number as not being a error and continues on.

Nolock only causes the same oops if you have it return a positive
error number.

Comment 4 michael conrad tadpol tilstra 2004-10-12 22:30:12 UTC

VFS does weird things with the error results, so before we try to
return a gulm error code, flip it to -1

Comment 5 Corey Marthaler 2004-10-27 21:23:02 UTC

fix verified.

Note You need to log in before you can comment on or make changes to this bug.