Bug 129162 - Oops starting clvmd
Summary: Oops starting clvmd
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-08-04 16:19 UTC by Derek Anderson
Modified: 2010-01-12 02:55 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-03-02 15:14:03 UTC
Embargoed:


Attachments (Terms of Use)

Description Derek Anderson 2004-08-04 16:19:32 UTC
Description of problem:
Was starting/stopping my 2-node cluster and got a kernel Oops on both
nodes when clvmd was started.

1. ccsd
2. cman_tool join (attain quorum)
3. fence_tool join
4. clvmd
5. reverse steps
6. repeat

Got the Oops on step 4 after 7 iterations.

FIRST NODE:
===========
Unable to handle kernel paging request at virtual address e0365798
 printing eip:
c01aa741
*pde = 014ed067
Oops: 0002 [#1]
Modules linked in: loop dlm cman ipv6 parport_pc lp parport autofs4
sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button
battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod
scsi_mod
CPU:    0
EIP:    0060:[<c01aa741>]    Not tainted
EFLAGS: 00010282   (2.6.7)
EIP is at kobject_add+0xb1/0xf0
eax: c0328008   ebx: c0328000   ecx: e0365798   edx: de1eca88
esi: c0328048   edi: de1eca6c   ebp: df76c660   esp: d6b19ea4
ds: 007b   es: 007b   ss: 0068
Process clvmd (pid: 16940, threadinfo=d6b18000 task=d6f0c830)
Stack: de1eca6c de1eca64 df76c64c df76c638 c01ff9ce de1eca6c de1ecaac
c01aa5e4
       de1eca64 de1eca58 fffffff4 de1eca64 df76c638 c01fffbf d6b19f08
00000000
       00000005 dd917364 c0320ba8 ffffffff c01e1794 df76c638 00a0003d
00000000
Call Trace:
 [<c01ff9ce>] class_device_add+0x5e/0x120
 [<c01aa5e4>] kobject_init+0x24/0x40
 [<c01fffbf>] class_simple_device_add+0xaf/0xe0
 [<c01e1794>] misc_register+0xc4/0x180
 [<e04bf3ba>] register_lockspace+0x11a/0x190 [dlm]
 [<e04bfad3>] dlm_ctl_ioctl+0x63/0xc0 [dlm]
 [<c014c62f>] filp_open+0x4f/0x60
 [<c015cf7f>] sys_ioctl+0xcf/0x210
 [<c0105cad>] sysenter_past_esp+0x52/0x71

Code: 89 11 89 4a 04 8b 47 28 8b 18 8d 4b 48 89 c8 ba ff ff 00 00
 Aug  4 16:58:13 link-10 kernel: dlm: clvmd: recover event 3 done
Aug  4 16:58:13 link-10 kernel: dlm: clvmd: recover event 3 finished
Aug  4 16:58:14 link-10 kernel: Unable to handle kernel paging request
at virtual address e0365798
Aug  4 16:58:14 link-10 kernel:  printing eip:
Aug  4 16:58:14 link-10 kernel: c01aa741
Aug  4 16:58:14 link-10 kernel: *pde = 014ed067
Aug  4 16:58:14 link-10 kernel: Oops: 0002 [#1]
Aug  4 16:58:14 link-10 kernel: Modules linked in: loop dlm cman ipv6
parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod
uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
Aug  4 16:58:14 link-10 kernel: CPU:    0
Aug  4 16:58:14 link-10 kernel: EIP:    0060:[<c01aa741>]    Not tainted
Aug  4 16:58:14 link-10 kernel: EFLAGS: 00010282   (2.6.7)
Aug  4 16:58:14 link-10 kernel: EIP is at kobject_add+0xb1/0xf0
Aug  4 16:58:14 link-10 kernel: eax: c0328008   ebx: c0328000   ecx:
e0365798   edx: de1eca88
Aug  4 16:58:14 link-10 kernel: esi: c0328048   edi: de1eca6c   ebp:
df76c660   esp: d6b19ea4
Aug  4 16:58:14 link-10 kernel: ds: 007b   es: 007b   ss: 0068
Aug  4 16:58:14 link-10 kernel: Process clvmd (pid: 16940,
threadinfo=d6b18000 task=d6f0c830)
Aug  4 16:58:14 link-10 kernel: Stack: de1eca6c de1eca64 df76c64c
df76c638 c01ff9ce de1eca6c de1ecaac c01aa5e4
Aug  4 16:58:14 link-10 kernel:        de1eca64 de1eca58 fffffff4
de1eca64 df76c638 c01fffbf d6b19f08 00000000
Aug  4 16:58:14 link-10 kernel:        00000005 dd917364 c0320ba8
ffffffff c01e1794 df76c638 00a0003d 00000000
Aug  4 16:58:14 link-10 kernel: Call Trace:
Aug  4 16:58:14 link-10 kernel:  [<c01ff9ce>] class_device_add+0x5e/0x120
Aug  4 16:58:14 link-10 kernel:  [<c01aa5e4>] kobject_init+0x24/0x40
Aug  4 16:58:14 link-10 kernel:  [<c01fffbf>]
class_simple_device_add+0xaf/0xe0
Aug  4 16:58:14 link-10 kernel:  [<c01e1794>] misc_register+0xc4/0x180
Aug  4 16:58:14 link-10 kernel:  [<e04bf3ba>]
register_lockspace+0x11a/0x190 [dlm]
Aug  4 16:58:14 link-10 kernel:  [<e04bfad3>] dlm_ctl_ioctl+0x63/0xc0
[dlm]
Aug  4 16:58:14 link-10 kernel:  [<c014c62f>] filp_open+0x4f/0x60
Aug  4 16:58:14 link-10 kernel:  [<c015cf7f>] sys_ioctl+0xcf/0x210
Aug  4 16:58:14 link-10 kernel:  [<c0105cad>] sysenter_past_esp+0x52/0x71
Aug  4 16:58:14 link-10 kernel:
Aug  4 16:58:14 link-10 kernel: Code: 89 11 89 4a 04 8b 47 28 8b 18 8d
4b 48 89 c8 ba ff ff 00 00

SECOND NODE:
============
Unable to handle kernel paging request at virtual address e0365798
 printing eip:
c01aa741
*pde = 014ed067
Oops: 0002 [#1]
Modules linked in: loop dlm cman ipv6 parport_pc lp parport autofs4
sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button
battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod
scsi_mod
CPU:    0
EIP:    0060:[<c01aa741>]    Not tainted
EFLAGS: 00010282   (2.6.7)
EIP is at kobject_add+0xb1/0xf0
eax: c0328008   ebx: c0328000   ecx: e0365798   edx: dccf8208
esi: c0328048   edi: dccf81ec   ebp: df76c660   esp: d7229ea4
ds: 007b   es: 007b   ss: 0068
Process clvmd (pid: 15649, threadinfo=d7228000 task=da5f8b30)
Stack: dccf81ec dccf81e4 df76c64c df76c638 c01ff9ce dccf81ec dccf822c
c01aa5e4
       dccf81e4 dccf81d8 fffffff4 dccf81e4 df76c638 c01fffbf d7229f08
00000000
       00000005 dccf8764 c0320ba8 ffffffff c01e1794 df76c638 00a0003d
00000000
Call Trace:
 [<c01ff9ce>] class_device_add+0x5e/0x120
 [<c01aa5e4>] kobject_init+0x24/0x40
 [<c01fffbf>] class_simple_device_add+0xaf/0xe0
 [<c01e1794>] misc_register+0xc4/0x180
 [<e04bf3ba>] register_lockspace+0x11a/0x190 [dlm]
 [<e04bfad3>] dlm_ctl_ioctl+0x63/0xc0 [dlm]
 [<e01f0064>] e1000_xmit_frame+0x4f4/0x7a0 [e1000]
 [<c0273e7c>] net_rx_action+0x6c/0xf0
 [<c011e809>] __do_softirq+0x79/0x80
 [<c015cf7f>] sys_ioctl+0xcf/0x210
 [<c0105cad>] sysenter_past_esp+0x52/0x71

Code: 89 11 89 4a 04 8b 47 28 8b 18 8d 4b 48 89 c8 ba ff ff 00 00
 Aug  4 16:54:08 link-11 kernel: dlm: clvmd: total nodes 1
Aug  4 16:54:08 link-11 kernel: dlm: clvmd: rebuild resource directory
Aug  4 16:54:08 link-11 kernel: dlm: clvmd: rebuilt 0 resources
Aug  4 16:54:08 link-11 kernel: dlm: clvmd: recover event 2 done
Aug  4 16:54:08 link-11 kernel: dlm: clvmd: recover event 2 finished
Aug  4 16:54:08 link-11 kernel: Unable to handle kernel paging request
at virtual address e0365798
Aug  4 16:54:08 link-11 kernel:  printing eip:
Aug  4 16:54:08 link-11 kernel: c01aa741
Aug  4 16:54:08 link-11 kernel: *pde = 014ed067
Aug  4 16:54:08 link-11 kernel: Oops: 0002 [#1]
Aug  4 16:54:08 link-11 kernel: Modules linked in: loop dlm cman ipv6
parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod
uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
Aug  4 16:54:08 link-11 kernel: CPU:    0
Aug  4 16:54:08 link-11 kernel: EIP:    0060:[<c01aa741>]    Not tainted
Aug  4 16:54:08 link-11 kernel: EFLAGS: 00010282   (2.6.7)
Aug  4 16:54:08 link-11 kernel: EIP is at kobject_add+0xb1/0xf0
Aug  4 16:54:08 link-11 kernel: eax: c0328008   ebx: c0328000   ecx:
e0365798   edx: dccf8208
Aug  4 16:54:08 link-11 kernel: esi: c0328048   edi: dccf81ec   ebp:
df76c660   esp: d7229ea4
Aug  4 16:54:08 link-11 kernel: ds: 007b   es: 007b   ss: 0068
Aug  4 16:54:08 link-11 kernel: Process clvmd (pid: 15649,
threadinfo=d7228000 task=da5f8b30)
Aug  4 16:54:08 link-11 kernel: Stack: dccf81ec dccf81e4 df76c64c
df76c638 c01ff9ce dccf81ec dccf822c c01aa5e4
Aug  4 16:54:08 link-11 kernel:        dccf81e4 dccf81d8 fffffff4
dccf81e4 df76c638 c01fffbf d7229f08 00000000
Aug  4 16:54:08 link-11 kernel:        00000005 dccf8764 c0320ba8
ffffffff c01e1794 df76c638 00a0003d 00000000
Aug  4 16:54:08 link-11 kernel: Call Trace:
Aug  4 16:54:08 link-11 kernel:  [<c01ff9ce>] class_device_add+0x5e/0x120
Aug  4 16:54:08 link-11 kernel:  [<c01aa5e4>] kobject_init+0x24/0x40
Aug  4 16:54:08 link-11 kernel:  [<c01fffbf>]
class_simple_device_add+0xaf/0xe0
Aug  4 16:54:08 link-11 kernel:  [<c01e1794>] misc_register+0xc4/0x180
Aug  4 16:54:08 link-11 kernel:  [<e04bf3ba>]
register_lockspace+0x11a/0x190 [dlm]
Aug  4 16:54:08 link-11 kernel:  [<e04bfad3>] dlm_ctl_ioctl+0x63/0xc0
[dlm]
Aug  4 16:54:08 link-11 kernel:  [<e01f0064>]
e1000_xmit_frame+0x4f4/0x7a0 [e1000]
Aug  4 16:54:08 link-11 kernel:  [<c0273e7c>] net_rx_action+0x6c/0xf0
Aug  4 16:54:08 link-11 kernel:  [<c011e809>] __do_softirq+0x79/0x80
Aug  4 16:54:08 link-11 kernel:  [<c015cf7f>] sys_ioctl+0xcf/0x210
Aug  4 16:54:08 link-11 kernel:  [<c0105cad>] sysenter_past_esp+0x52/0x71
Aug  4 16:54:08 link-11 kernel:
Aug  4 16:54:08 link-11 kernel: Code: 89 11 89 4a 04 8b 47 28 8b 18 8d
4b 48 89 c8 ba ff ff 00 00

Version-Release number of selected component (if applicable):
[root@link-11 root]# clvmd -V

Cluster LVM Daemon version 0.2.1

How reproducible:
Haven't tried yet.

Steps to Reproduce:
1. Listed above.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Christine Caulfield 2004-08-12 13:07:12 UTC
Those symptoms are very strange. Do you have a script I can use to try
and reproduce this ?

Comment 2 Christine Caulfield 2004-09-20 15:11:38 UTC
OK, I'm fairly convinced that this is a symptom of some other bug.

I've seen a pool corruption reported by a debug kernel when doing a
cman_tool leave so this would be a likely culprit.

Comment 3 Kiersten (Kerri) Anderson 2004-11-16 19:13:02 UTC
Updating version to the right level in the defects.  Sorry for the storm.

Comment 4 Christine Caulfield 2005-01-11 15:54:38 UTC
I've not seen this and I suspect it has been fixed by other checkins.
Punt it back if you see it again or have something I can reproduce it
with.

Comment 5 Derek Anderson 2005-03-02 15:14:03 UTC
Well, it doesn't Oops anymore.  It does _hang_ in the state below, but
that's another bug, I guess.

DLM Lock Space:  "clvmd"                             4   4 update   
U-4,1,12
[8 11 10 12]


Note You need to log in before you can comment on or make changes to this bug.