Bug 132772

Summary: GFS OOPSes when a lock module returns an error on mount
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: gfsAssignee: michael conrad tadpol tilstra <mtilstra>
Status: CLOSED CURRENTRELEASE QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 3   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-27 21:23:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2004-09-16 20:26:40 UTC
Description of problem: 
The name of my cluster is 'morph-cluster'. 
 
I started services on all in cluster: 
'lock_gulmd -s morph-01,morph-03,morph-05 -n morph-cluster' 
 
I created an fs with a different cluster name "foobar". 
 
Attemtped to mount: 
mount -t gfs /dev/clvmdtest/lvol0 /mnt/clvmdtest0  
 
lock_gulm: ERROR Core returned error 1003:Bad Cluster ID. 
lock_gulm: ERROR cm_login failed. 1003 
lock_gulm: ERROR Got a 1003 trying to start the threads. 
lock_gulm: fsid=foobarcluster:clvmdtest0: Exiting gulm_mount with 
errors 1003 
GFS: can't mount proto = lock_gulm, table = 
foobarcluster:clvmdtest0, hostdata = 
Unable to handle kernel NULL pointer dereference at virtual address 
00000427 
 printing eip: 
c0162b21 
*pde = 00000000 
Oops: 0000 [#1] 
SMP 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac 
ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    1 
EIP:    0060:[<c0162b21>]    Not tainted 
EFLAGS: 00010246   (2.6.8.1) 
EIP is at do_kern_mount+0xb1/0x150 
eax: 00000000   ebx: 00000000   ecx: c034b900   edx: 00000000 
esi: 000003eb   edi: f7f64b00   ebp: f8a65800   esp: f43ddeec 
ds: 007b   es: 007b   ss: 0068 
Process mount (pid: 4350, threadinfo=f43dc000 task=f6cf46d0) 
Stack: 00000000 00000000 f509c000 00000000 f43ba000 00000000 
f43ba003 f509c000 
       c017858f 00000000 00000000 00000000 f43ddf4c c034b900 
00000000 f43ddf4c 
       00000000 c0178c40 00000000 f509c000 00000000 00000000 
f43ba000 f509c000 
Call Trace: 
 [<c017858f>] do_new_mount+0x6f/0xc0 
 [<c0178c40>] do_mount+0x170/0x1b0 
 [<c01c5015>] copy_from_user+0x45/0x80 
 [<c0178a69>] copy_mount_options+0x59/0xc0 
 [<c0179007>] sys_mount+0xa7/0x130 
 [<c0105e4d>] sysenter_past_esp+0x52/0x71 
Code: 8b 56 3c 85 d2 74 09 8b 02 85 c0 74 4a f0 ff 02 89 57 10 8b 
 Sep 16 15:23:09 morph-01 lock_gulmd_LT000[4335]: New Client: idx 9 
fd 14 from morph-06 ::ffff:192.168.44.66 
Sep 16 15:23:09 morph-01 sshd(pam_unix)[4338]: session opened for 
user root by (uid=0) 
Sep 16 15:23:09 morph-01 sshd(pam_unix)[4338]: session closed for 
user root 
Sep 16 15:23:09 morph-01 sshd(pam_unix)[4348]: session opened for 
user root by (uid=0) 
Sep 16 15:23:09 morph-01 lock_gulmd_core[4333]: ERROR 
[src/core_io.c:1204] ::1 claims to be part of foobarcluster, but we 
are morph-cluster 
Sep 16 15:23:09 morph-01 kernel: lock_gulm: ERROR Core returned 
error 1003:Bad Cluster ID. 
Sep 16 15:23:09 morph-01 kernel: lock_gulm: ERROR cm_login failed. 
1003 
Sep 16 15:23:09 morph-01 kernel: lock_gulm: ERROR Got a 1003 trying 
to start the threads. 
Sep 16 15:23:09 morph-01 kernel: lock_gulm: 
fsid=foobarcluster:clvmdtest0: Exiting gulm_mount with errors 1003 
Sep 16 15:23:09 morph-01 kernel: GFS: can't mount proto = lock_gulm, 
table = foobarcluster:clvmdtest0, hostdata = 
Sep 16 15:23:09 morph-01 kernel: Unable to handle kernel NULL 
pointer dereference at virtual address 00000427 
Sep 16 15:23:09 morph-01 kernel:  printing eip: 
Sep 16 15:23:09 morph-01 kernel: c0162b21 
Sep 16 15:23:09 morph-01 kernel: *pde = 00000000 
Sep 16 15:23:09 morph-01 kernel: Oops: 0000 [#1] 
Sep 16 15:23:09 morph-01 kernel: SMP 
Sep 16 15:23:09 morph-01 kernel: Modules linked in: gnbd lock_gulm 
lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp 
parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd 
ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx 
scsi_transport_fc sd_mod scsi_mod 
Sep 16 15:23:09 morph-01 kernel: CPU:    1 
Sep 16 15:23:09 morph-01 kernel: EIP:    0060:[<c0162b21>]    Not 
tainted 
Sep 16 15:23:09 morph-01 kernel: EFLAGS: 00010246   (2.6.8.1) 
Sep 16 15:23:09 morph-01 kernel: EIP is at do_kern_mount+0xb1/0x150 
Sep 16 15:23:09 morph-01 kernel: eax: 00000000   ebx: 00000000   
ecx: c034b900   edx: 00000000 
Sep 16 15:23:09 morph-01 kernel: esi: 000003eb   edi: f7f64b00   
ebp: f8a65800   esp: f43ddeec 
Sep 16 15:23:09 morph-01 sshd(pam_unix)[4348]: session closed for 
user root 
Sep 16 15:23:09 morph-01 kernel: ds: 007b   es: 007b   ss: 0068 
Sep 16 15:23:09 morph-01 kernel: Process mount (pid: 4350, 
threadinfo=f43dc000 task=f6cf46d0) 
Sep 16 15:23:09 morph-01 kernel: Stack: 00000000 00000000 f509c000 
00000000 f43ba000 00000000 f43ba003 f509c000 
Sep 16 15:23:09 morph-01 kernel:        c017858f 00000000 00000000 
00000000 f43ddf4c c034b900 00000000 f43ddf4c 
Sep 16 15:23:09 morph-01 kernel:        00000000 c0178c40 00000000 
f509c000 00000000 00000000 f43ba000 f509c000 
Sep 16 15:23:09 morph-01 kernel: Call Trace: 
Sep 16 15:23:09 morph-01 kernel:  [<c017858f>] 
do_new_mount+0x6f/0xc0 
Sep 16 15:23:09 morph-01 kernel:  [<c0178c40>] do_mount+0x170/0x1b0 
Sep 16 15:23:09 morph-01 kernel:  [<c01c5015>] 
copy_from_user+0x45/0x80 
Sep 16 15:23:09 morph-01 kernel:  [<c0178a69>] 
copy_mount_options+0x59/0xc0 
Sep 16 15:23:09 morph-01 kernel:  [<c0179007>] sys_mount+0xa7/0x130 
Sep 16 15:23:09 morph-01 kernel:  [<c0105e4d>] 
sysenter_past_esp+0x52/0x71 
Sep 16 15:23:09 morph-01 kernel: Code: 8b 56 3c 85 d2 74 09 8b 02 85 
c0 74 4a f0 ff 02 89 57 10 8b 
 
 
How reproducible: 
Always

Comment 1 michael conrad tadpol tilstra 2004-09-23 14:37:58 UTC
the oops isn't happening in gulm.  It seems to be either up in gfs or
perhap up in the vfs after gulm returns an error on mount.  Still, it
does seem the fault of gulm.  I'm making a guess that when gulm
returns a mount error, it leaves many of the other params untouched. 
And perhaps gfs is trying to dereference those.  maybe.  will dig more.

Comment 2 michael conrad tadpol tilstra 2004-09-23 14:57:02 UTC
Not a gulm bug.  Lock modules returning errors to gfs on mount panic.
If you make the first line of the mount function in the nolock module
to return an error, you get the same OOPS.


Comment 3 Ken Preslan 2004-10-12 21:26:59 UTC
Yes, it is a gulm bug.

In Linux error numbers are negative.  lock_gulm is returning a
positive error number (1003) which GFS passes up to the VFS.  The VFS
interprets the positive number as not being a error and continues on.

Nolock only causes the same oops if you have it return a positive
error number.



Comment 4 michael conrad tadpol tilstra 2004-10-12 22:30:12 UTC
VFS does weird things with the error results, so before we try to
return a gulm error code, flip it to -1


Comment 5 Corey Marthaler 2004-10-27 21:23:02 UTC
fix verified.