Description of problem: The name of my cluster is 'morph-cluster'. I started services on all in cluster: 'lock_gulmd -s morph-01,morph-03,morph-05 -n morph-cluster' I created an fs with a different cluster name "foobar". Attemtped to mount: mount -t gfs /dev/clvmdtest/lvol0 /mnt/clvmdtest0 lock_gulm: ERROR Core returned error 1003:Bad Cluster ID. lock_gulm: ERROR cm_login failed. 1003 lock_gulm: ERROR Got a 1003 trying to start the threads. lock_gulm: fsid=foobarcluster:clvmdtest0: Exiting gulm_mount with errors 1003 GFS: can't mount proto = lock_gulm, table = foobarcluster:clvmdtest0, hostdata = Unable to handle kernel NULL pointer dereference at virtual address 00000427 printing eip: c0162b21 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 1 EIP: 0060:[<c0162b21>] Not tainted EFLAGS: 00010246 (2.6.8.1) EIP is at do_kern_mount+0xb1/0x150 eax: 00000000 ebx: 00000000 ecx: c034b900 edx: 00000000 esi: 000003eb edi: f7f64b00 ebp: f8a65800 esp: f43ddeec ds: 007b es: 007b ss: 0068 Process mount (pid: 4350, threadinfo=f43dc000 task=f6cf46d0) Stack: 00000000 00000000 f509c000 00000000 f43ba000 00000000 f43ba003 f509c000 c017858f 00000000 00000000 00000000 f43ddf4c c034b900 00000000 f43ddf4c 00000000 c0178c40 00000000 f509c000 00000000 00000000 f43ba000 f509c000 Call Trace: [<c017858f>] do_new_mount+0x6f/0xc0 [<c0178c40>] do_mount+0x170/0x1b0 [<c01c5015>] copy_from_user+0x45/0x80 [<c0178a69>] copy_mount_options+0x59/0xc0 [<c0179007>] sys_mount+0xa7/0x130 [<c0105e4d>] sysenter_past_esp+0x52/0x71 Code: 8b 56 3c 85 d2 74 09 8b 02 85 c0 74 4a f0 ff 02 89 57 10 8b Sep 16 15:23:09 morph-01 lock_gulmd_LT000[4335]: New Client: idx 9 fd 14 from morph-06 ::ffff:192.168.44.66 Sep 16 15:23:09 morph-01 sshd(pam_unix)[4338]: session opened for user root by (uid=0) Sep 16 15:23:09 morph-01 sshd(pam_unix)[4338]: session closed for user root Sep 16 15:23:09 morph-01 sshd(pam_unix)[4348]: session opened for user root by (uid=0) Sep 16 15:23:09 morph-01 lock_gulmd_core[4333]: ERROR [src/core_io.c:1204] ::1 claims to be part of foobarcluster, but we are morph-cluster Sep 16 15:23:09 morph-01 kernel: lock_gulm: ERROR Core returned error 1003:Bad Cluster ID. Sep 16 15:23:09 morph-01 kernel: lock_gulm: ERROR cm_login failed. 1003 Sep 16 15:23:09 morph-01 kernel: lock_gulm: ERROR Got a 1003 trying to start the threads. Sep 16 15:23:09 morph-01 kernel: lock_gulm: fsid=foobarcluster:clvmdtest0: Exiting gulm_mount with errors 1003 Sep 16 15:23:09 morph-01 kernel: GFS: can't mount proto = lock_gulm, table = foobarcluster:clvmdtest0, hostdata = Sep 16 15:23:09 morph-01 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000427 Sep 16 15:23:09 morph-01 kernel: printing eip: Sep 16 15:23:09 morph-01 kernel: c0162b21 Sep 16 15:23:09 morph-01 kernel: *pde = 00000000 Sep 16 15:23:09 morph-01 kernel: Oops: 0000 [#1] Sep 16 15:23:09 morph-01 kernel: SMP Sep 16 15:23:09 morph-01 kernel: Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod Sep 16 15:23:09 morph-01 kernel: CPU: 1 Sep 16 15:23:09 morph-01 kernel: EIP: 0060:[<c0162b21>] Not tainted Sep 16 15:23:09 morph-01 kernel: EFLAGS: 00010246 (2.6.8.1) Sep 16 15:23:09 morph-01 kernel: EIP is at do_kern_mount+0xb1/0x150 Sep 16 15:23:09 morph-01 kernel: eax: 00000000 ebx: 00000000 ecx: c034b900 edx: 00000000 Sep 16 15:23:09 morph-01 kernel: esi: 000003eb edi: f7f64b00 ebp: f8a65800 esp: f43ddeec Sep 16 15:23:09 morph-01 sshd(pam_unix)[4348]: session closed for user root Sep 16 15:23:09 morph-01 kernel: ds: 007b es: 007b ss: 0068 Sep 16 15:23:09 morph-01 kernel: Process mount (pid: 4350, threadinfo=f43dc000 task=f6cf46d0) Sep 16 15:23:09 morph-01 kernel: Stack: 00000000 00000000 f509c000 00000000 f43ba000 00000000 f43ba003 f509c000 Sep 16 15:23:09 morph-01 kernel: c017858f 00000000 00000000 00000000 f43ddf4c c034b900 00000000 f43ddf4c Sep 16 15:23:09 morph-01 kernel: 00000000 c0178c40 00000000 f509c000 00000000 00000000 f43ba000 f509c000 Sep 16 15:23:09 morph-01 kernel: Call Trace: Sep 16 15:23:09 morph-01 kernel: [<c017858f>] do_new_mount+0x6f/0xc0 Sep 16 15:23:09 morph-01 kernel: [<c0178c40>] do_mount+0x170/0x1b0 Sep 16 15:23:09 morph-01 kernel: [<c01c5015>] copy_from_user+0x45/0x80 Sep 16 15:23:09 morph-01 kernel: [<c0178a69>] copy_mount_options+0x59/0xc0 Sep 16 15:23:09 morph-01 kernel: [<c0179007>] sys_mount+0xa7/0x130 Sep 16 15:23:09 morph-01 kernel: [<c0105e4d>] sysenter_past_esp+0x52/0x71 Sep 16 15:23:09 morph-01 kernel: Code: 8b 56 3c 85 d2 74 09 8b 02 85 c0 74 4a f0 ff 02 89 57 10 8b How reproducible: Always
the oops isn't happening in gulm. It seems to be either up in gfs or perhap up in the vfs after gulm returns an error on mount. Still, it does seem the fault of gulm. I'm making a guess that when gulm returns a mount error, it leaves many of the other params untouched. And perhaps gfs is trying to dereference those. maybe. will dig more.
Not a gulm bug. Lock modules returning errors to gfs on mount panic. If you make the first line of the mount function in the nolock module to return an error, you get the same OOPS.
Yes, it is a gulm bug. In Linux error numbers are negative. lock_gulm is returning a positive error number (1003) which GFS passes up to the VFS. The VFS interprets the positive number as not being a error and continues on. Nolock only causes the same oops if you have it return a positive error number.
VFS does weird things with the error results, so before we try to return a gulm error code, flip it to -1
fix verified.