Description of problem: I had a three node GFS/rgmanager cluster (taft-02, taft-03, taft-04) exporting 10 GFS/NFS/IP services and 5 EXT/NFS/IP services to an NFS client. I started some simple I/O from that client to all 15 filesystems and then ran a test "derringer" which randomly recovers machines and relocates services. As the killed machine (taft-04) came back into the cluster, taft-03 paniced. This issue may be related to bz 175629. Version-Release number of selected component (if applicable): Linux taft-03 2.6.9-22.0.1.ELsmp #1 SMP Tue Oct 18 18:39:02 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux GFS 2.6.9-45.0 (built Nov 28 2005 11:39:41) installed CMAN 2.6.9-41.0 (built Nov 28 2005 11:26:37) installed rgmanager-1.9.43-0 derringer output: Iteration 6 started at Wed Dec 14 07:42:21 CST 2005 For this iteration we're gonna RECOVER MACHINES Those machines facing the derringer=taft-04 Feeling lucky taft-04? Well do ya? Go'head make my day... No heartbeat from remote hostRemote command exited with unknown state taft-04... DEAD verify alive Still not all alive, sleeping another 10 seconds Still not all alive, sleeping another 10 seconds Still not all alive, sleeping another 10 seconds Still not all alive, sleeping another 10 seconds Still not all alive, sleeping another 10 seconds Still not all alive, sleeping another 10 seconds Still not all alive, sleeping another 10 seconds Still not all alive, sleeping another 10 seconds All killed nodes are back up, making sure they're qarshable... Still not all qarshable, sleeping another 10 seconds Still not all qarshable, sleeping another 10 seconds Still not all qarshable, sleeping another 10 seconds Still not all qarshable, sleeping another 10 seconds Still not all qarshable, sleeping another 10 seconds Still not all qarshable, sleeping another 10 seconds Still not all qarshable, sleeping another 10 seconds Still not all qarshable, sleeping another 10 seconds start cluster verify recovery checking Fence recovery... checking DLM recovery... checking GFS recovery... checking the rgmanager status and state of duel_nodes... Checking status of GFS0 service on taft-04 IN get_rg_service_state, SERVICE=GFS0, NODE=taft-04 [panic on taft-03] CMAN: removing node taft-04 from the cluster : Missed too many heartbeats GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft8.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft7.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft8.2: jid=1: Busy GFS: fsid=TAFT234_CLUSTER:taft7.2: jid=1: Busy GFS: fsid=TAFT234_CLUSTER:taft6.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft5.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft4.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft3.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft2.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft1.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft0.2: jid=1: Trying to acquire journal lock... GFS: fsid=TAFT234_CLUSTER:taft6.2: jid=1: Busy GFS: fsid=TAFT234_CLUSTER:taft5.2: jid=1: Busy GFS: fsid=TAFT234_CLUSTER:taft4.2: jid=1: Busy GFS: fsid=TAFT234_CLUSTER:taft3.2: jid=1: Busy GFS: fsid=TAFT234_CLUSTER:taft1.2: jid=1: Busy GFS: fsid=TAFT234_CLUSTER:taft2.2: jid=1: Busy GFS: fsid=TAFT234_CLUSTER:taft0.2: jid=1: Busy GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Looking at journal... GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Acquiring the transaction lock... GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Replaying journal... GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Replayed 0 of 5 blocks GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: replays = 0, skips = 2, sames = 3 GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Journal replayed in 1s GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Done eip: ffffffffa01947<a4a eipe:i p:f ffffffffffffffa0f1a904179a47aa a ---------- [cut here ] --------- [please bite here ] --------- Kernel BUG at spinlock:118 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: radeon nfsd exportfs lockd parport_pc lp parport autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd ehci_hcd hw_random e1000 floppy qla2300 qla2xxx sg dm_snapshot dm_zerodm_mirror ext3 jbd dm_mod lpfc scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod Pid: 5861, comm: nfsd Not tainted 2.6.9-22.0.1.ELsmp RIP: 0010:[<ffffffff80303ed3>] <ffffffff80303ed3>{_spin_lock_bh+51} RSP: 0018:0000010211c7be48 EFLAGS: 00010212 RAX: 0000000000000016 RBX: 000001021dd8e9b0 RCX: 0000000000020000 RDX: 00000000000094c9 RSI: 0000000000000246 RDI: ffffffff803d78e0 RBP: 000001000e079400 R08: 00000000fffffffe R09: 000001021dd8e9b0 R10: 0000000000000000 R11: 0000000000000000 R12: 000001021dd8e9b0 R13: 000001021dd8e980 R14: 0000010211c7be88 R15: 000000000036ee80 FS: 0000002a9589fb00(0000) GS:ffffffff804d3080(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000314a82a008 CR3: 0000000000101000 CR4: 00000000000006e0 Process nfsd (pid: 5861, threadinfo 0000010211c7a000, task 00000102115d3030) Stack: 000001000e079660 ffffffffa01947aa 0000000000000000 00000102115d3030 ffffffff80132e59 0000000000000000 0000000000000000 0000000000000246 0000000000000000 00000102115d3030 Call Trace:<ffffffffa01947aa>{:sunrpc:svc_recv+819} <ffffffff80132e59>{default_wake_function+0} <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffff80110c9b>{child_rip+0} Code: 0f 0b db d5 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 18 02 00 RIP <ffffffff80303ed3>{_spin_lock_bh+51} RSP <0000010211c7be48> --<0-->-Ke-r-n--el- -p a[ncuict -h ernoet ]s y--nc-i--ng-:- --Oo [pspl eaeseip b:i fteff hffefrfe f]a 0-1-94--7a--a- - - Kernel BUG at spinlock:118 invalid operand: 0000 [2] SMP CPU 1 Modules linked in: radeon nfsd exportfs lockd parport_pc lp parport autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd ehci_hcd hw_random e1000 floppy qla2300 qla2xxx sg dm_snapshot dm_zerodm_mirror ext3 jbd dm_mod lpfc scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod Pid: 5854, comm: nfsd Not tainted 2.6.9-22.0.1.ELsmp RIP: 0010:[<ffffffff80303ed3>] <ffffffff80303ed3>{_spin_lock_bh+51} RSP: 0018:000001021183be48 EFLAGS: 00010212 RAX: 0000000000000016 RBX: 000001021dd8e9b0 RCX: 0000000000000246 RDX: 0000000000009525 RSI: 0000000000000246 RDI: ffffffff803d78e0 RBP: 0000010037d10800 R08: 000000000000000d R09: 000001021dd8e9b0 R10: 0000000000000000 R11: 0000000000000000 R12: 000001021dd8e9b0 R13: 000001021dd8e980 R14: 000001021183be88 R15: 000000000036ee80 FS: 0000002a9589fb00(0000) GS:ffffffff804d3100(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000314a68ed20 CR3: 00000000dff88000 CR4: 00000000000006e0 Process nfsd (pid: 5854, threadinfo 000001021183a000, task 0000010213bc6030) Stack: 0000010037d10a60 ffffffffa01947aa 0000000000000000 0000010213bc6030 ffffffff80132e59 0000000000000000 0000000000000000 0000000000000246 0000000000000000 0000010213bc6030 Call Trace:<ffffffffa01947aa>{:sunrpc:svc_recv+819} <ffffffff80132e59>{default_wake_function+0} <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffff80110c9b>{child_rip+0} Code: 0f 0b db d5 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 18 02 00 RIP <ffffffff80303ed3>{_spin_lock_bh+51} RSP <000001021183be48> Badness in do_unblank_screen at drivers/char/vt.c:2876 Call Trace:<ffffffff802319d2>{do_unblank_screen+61} <ffffffff80122dc8>{bust_spinlocks+28} <ffffffff80111834>{oops_end+18} <ffffffff80111961>{die+54} <ffffffff80111d24>{do_invalid_op+145} <ffffffff80303ed3>{_spin_lock_bh+51} <ffffffff801371ba>{release_console_sem+369} <ffffffff801373e8>{vprintk+498} <ffffffff80137492>{printk+141} <ffffffff80110aed>{error_exit+0} <ffffffff80303ed3>{_spin_lock_bh+51} <ffffffff80303ed3>{_spin_lock_bh+51} <ffffffffa01947aa>{:sunrpc:svc_recv+819} <ffffffff80132e59>{default_wake_function+0} <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffff80110c9b>{child_rip+0} ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at spinlock:118 invalid operand: 0000 [3] SMP CPU 2 Modules linked in: radeon nfsd exportfs lockd parport_pc lp parport autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd ehci_hcd hw_random e1000 floppy qla2300 qla2xxx sg dm_snapshot dm_zerodm_mirror ext3 jbd dm_mod lpfc scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod Pid: 5860, comm: nfsd Not tainted 2.6.9-22.0.1.ELsmp RIP: 0010:[<ffffffff80303ed3>] <ffffffff80303ed3>{_spin_lock_bh+51} RSP: 0018:0000010211bcde48 EFLAGS: 00010212 RAX: 0000000000000016 RBX: 000001021dd8e9b0 RCX: 0000000000020000 RDX: 0000000000009d67 RSI: 0000000000000203 RDI: ffffffff803d78e0 RBP: 000001000e07ac00 R08: 00000000fffffffe R09: 000001021dd8e9b0 R10: 0000000000000000 R11: 0000000000000000 R12: 000001021dd8e9b0 R13: 000001021dd8e980 R14: 0000010211bcde88 R15: 000000000036ee80 FS: 0000002a9589fb00(0000) GS:ffffffff804d3180(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a95771958 CR3: 00000000dffbe000 CR4: 00000000000006e0 Process nfsd (pid: 5860, threadinfo 0000010211bcc000, task 00000102115d37f0) Stack: 000001000e07ae60 ffffffffa01947aa 0000000000000000 00000102115d37f0 ffffffff80132e59 0000000000000000 0000000000000000 0000000000000246 0000000000000000 00000102115d37f0 Call Trace:<ffffffffa01947aa>{:sunrpc:svc_recv+819} <ffffffff80132e59>{default_wake_function+0} <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffff80110c9b>{child_rip+0} Code: 0f 0b db d5 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 18 02 00 RIP <ffffffff80303ed3>{_spin_lock_bh+51} RSP <0000010211bcde48> ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at spinlock:118 invalid operand: 0000 [4] SMP CPU 3 Modules linked in: radeon nfsd exportfs lockd parport_pc lp parport autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd ehci_hcd hw_random e1000 floppy qla2300 qla2xxx sg dm_snapshot dm_zerodm_mirror ext3 jbd dm_mod lpfc scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod Pid: 5859, comm: nfsd Not tainted 2.6.9-22.0.1.ELsmp RIP: 0010:[<ffffffff80303ed3>] <ffffffff80303ed3>{_spin_lock_bh+51} RSP: 0018:00000102119dfe48 EFLAGS: 00010212 RAX: 0000000000000016 RBX: 000001021dd8e9b0 RCX: 0000000000020000 RDX: 0000000000009525 RSI: 0000000000000203 RDI: ffffffff803d78e0 RBP: 000001000e07a800 R08: 00000000fffffffe R09: 000001021dd8e9b0 R10: 0000000000000000 R11: 0000000000000000 R12: 000001021dd8e9b0 R13: 000001021dd8e980 R14: 00000102119dfe88 R15: 000000000036ee80 FS: 0000002a9589fb00(0000) GS:ffffffff804d3200(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000007fbfffefa0 CR3: 0000000037e34000 CR4: 00000000000006e0 Process nfsd (pid: 5859, threadinfo 00000102119de000, task 000001021651e7f0) Stack: 000001000e07aa60 ffffffffa01947aa 0000000000000000 000001021651e7f0 ffffffff80132e59 0000000000000000 0000000000000000 0000000000000246 0000000000000000 000001021651e7f0 Call Trace:<ffffffffa01947aa>{:sunrpc:svc_recv+819} <ffffffff80132e59>{default_wake_function+0} <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffff80110c9b>{child_rip+0} Code: 0f 0b db d5 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 18 02 00 RIP <ffffffff80303ed3>{_spin_lock_bh+51} RSP <00000102119dfe48>
[root@taft-02 ~]# clustat Member Status: Quorate Member Name Status ------ ---- ------ taft-02 Online, Local, rgmanager taft-03 Offline taft-04 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- GFS0 taft-02 started GFS1 taft-02 started GFS2 taft-02 started GFS3 taft-02 started GFS4 taft-02 started GFS5 taft-02 started GFS6 taft-02 started GFS7 taft-02 started GFS8 taft-02 started GFS9 taft-02 started EXT10 taft-02 started EXT11 taft-02 started EXT12 taft-02 started EXT13 taft-02 started EXT14 taft-02 started logman taft-02 started
Hit this again today without doing any recovery testing. Just had 4 GFS filesystems being NFS serviced to 5 clients with rgmanger. Once I/O started taft-01 paniced. eip: fffff<f4f>efaip0:16 bff90ffb fefaip0:16 bff90ffb ff fff--a-01--6b--9-0b-- [cut here ] --------- [please bite here ] --------- Kernel BUG at spinlock:118 invalid operand: 0000 [1] SMP CPU 1 Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gnbd(U) lock_nolock(U) gfs(U) lock_harness(U)d Pid: 7657, comm: nfsd Not tainted 2.6.9-34.ELsmp RIP: 0010:[<ffffffff80305ab4>] <ffffffff80305ab4>{_spin_lock_bh+51} RSP: 0018:0000010218eb9e48 EFLAGS: 00010212 RAX: 0000000000000016 RBX: 0000010217eec2b0 RCX: 0000000000020000 RDX: 0000000000008125 RSI: 0000000000000246 RDI: ffffffff803d9e60 RBP: 0000010037c71400 R08: 00000000fffffffe R09: 0000010217eec2b0 R10: 0000000000000000 R11: 0000000000000000 R12: 0000010217eec2b0 R13: 0000010217eec280 R14: 0000010218eb9e88 R15: 000000000036ee80 FS: 0000002a958a0b00(0000) GS:ffffffff804d7b80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a9946fad8 CR3: 00000001fffb8000 CR4: 00000000000006e0 Process nfsd (pid: 7657, threadinfo 0000010218eb8000, task 0000010217c8c7f0) Stack: 0000010037c71660 ffffffffa016b90b 0000000000000000 0000010217c8c7f0 ffffffff801333c8 0000000000000000 0000000000000000 0000000000000001 0000000000000000 0000010217c8c7f0 Call Trace:<ffffffffa016b90b>{:sunrpc:svc_recv+819} <ffffffff801333c8>{default_wake_function+0} <ffffffff801333c8>{default_wake_function+0} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de479>{:nfsd:nfsd+381} <ffffffff8013212e>{schedule_tail+55} <ffffffff80110e17>{child_rip+8} <ffffffffa02de2fc>{ei:np:fs df:ffnffsffd+ff0}a01<46b> 90 b <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0} Code: 0f 0b f6 f2 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1c 02 00 RIP <ffffffff80305ab4>{_spin_lock_bh+51} RSP <0000010218eb9e48> --<0-->-Ke-rn--el-- p- an[cicu t- h nerote s] yn--ci--ng--: --Oo- p[sp ea se bite here ] --------- l Kernel BUG at spinlock:118 invalid operand: 0000 [2] SMP CPU 2 Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gnbd(U) lock_nolock(U) gfs(U) lock_harness(U)d Pid: 7660, comm: nfsd Not tainted 2.6.9-34.ELsmp RIP: 0010:[<ffffffff80305ab4>] <ffffffff80305ab4>{_spin_lock_bh+51} RSP: 0018:00000102197c1e48 EFLAGS: 00010212 RAX: 0000000000000016 RBX: 0000010217eec2b0 RCX: 0000000000020000 RDX: 0000000000008181 RSI: 0000000000000203 RDI: ffffffff803d9e60 RBP: 00000100dffb8c00 R08: 00000000fffffffe R09: 0000010217eec2b0 R10: 0000000000000000 R11: 0000000000000000 R12: 0000010217eec2b0 R13: 0000010217eec280 R14: 00000102197c1e88 R15: 000000000036ee80 FS: 0000002a958a0b00(0000) GS:ffffffff804d7c00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a9556c000 CR3: 00000000dffae000 CR4: 00000000000006e0 Process nfsd (pid: 7660, threadinfo 00000102197c0000, task 00000102155607f0) Stack: 00000100dffb8e60 ffffffffa016b90b 0000000000000000 00000102155607f0 ffffffff801333c8 0000000000000000 0000000000000000 0000000000000001 0000000000000000 00000102155607f0 Call Trace:<ffffffffa016b90b>{:sunrpc:svc_recv+819} <ffffffff801333c8>{default_wake_function+0} <ffffffff801333c8>{default_wake_function+0} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de479>{:nfsd:nfsd+381} <ffffffff8013212e>{schedule_tail+55} <ffffffff80110e17>{child_rip+8} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0} Code: 0f 0b f6 f2 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1c 02 00 RIP <ffffffff80305ab4>{_spin_lock_bh+51} RSP <00000102197c1e48> Badness in do_unblank_screen at drivers/char/vt.c:2876 Call Trace:<ffffffff80232d46>{do_unblank_screen+61} <ffffffff801231cc>{bust_spinlocks+28} <ffffffff801119a8>{oops_end+18} <ffffffff80111ad5>{die+54} <ffffffff80111e98>{do_invalid_op+145} <ffffffff80305ab4>{_spin_lock_bh+51} <ffffffff80137b55>{vprintk+515} <ffffffff80137bee>{printk+141} <ffffffff80110c61>{error_exit+0} <ffffffff80305ab4>{_spin_lock_bh+51} <ffffffff80305ab4>{_spin_lock_bh+51} <ffffffffa016b90b>{:sunrpc:svc_recv+819} <ffffffff801333c8>{default_wake_function+0} <ffffffff801333c8>{default_wake_function+0} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de479>{:nfsd:nfsd+381} <ffffffff8013212e>{schedule_tail+55} <ffffffff80110e17>{child_rip+8} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0} ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at spinlock:118 invalid operand: 0000 [3] SMP CPU 3 Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gnbd(U) lock_nolock(U) gfs(U) lock_harness(U)d Pid: 7654, comm: nfsd Not tainted 2.6.9-34.ELsmp RIP: 0010:[<ffffffff80305ab4>] <ffffffff80305ab4>{_spin_lock_bh+51} RSP: 0018:0000010216a17e48 EFLAGS: 00010212 RAX: 0000000000000016 RBX: 0000010217eec2b0 RCX: 0000000000000246 RDX: 0000000000008181 RSI: 0000000000000246 RDI: ffffffff803d9e60 RBP: 00000101fff27c00 R08: 000000000000000d R09: 0000010217eec2b0 R10: 0000000000000000 R11: 0000000000000000 R12: 0000010217eec2b0 R13: 0000010217eec280 R14: 0000010216a17e88 R15: 000000000036ee80 FS: 0000002a958a0b00(0000) GS:ffffffff804d7c80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a95bbcb40 CR3: 0000000037e24000 CR4: 00000000000006e0 Process nfsd (pid: 7654, threadinfo 0000010216a16000, task 00000102188b1030) Stack: 00000101fff27e60 ffffffffa016b90b 0000000000000000 00000102188b1030 ffffffff801333c8 0000000000000000 0000000000000000 000001021f8d4c00 0000000000000000 00000102188b1030 Call Trace:<ffffffffa016b90b>{:sunrpc:svc_recv+819} <ffffffff801333c8>{default_wake_function+0} <ffffffff801333c8>{default_wake_function+0} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de479>{:nfsd:nfsd+381} <ffffffff8013212e>{schedule_tail+55} <ffffffff80110e17>{child_rip+8} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0} Code: 0f 0b f6 f2 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1c 02 00 RIP <ffffffff80305ab4>{_spin_lock_bh+51} RSP <0000010216a17e48> ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at spinlock:118 invalid operand: 0000 [4] SMP CPU 0 Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gnbd(U) lock_nolock(U) gfs(U) lock_harness(U)d Pid: 7658, comm: nfsd Not tainted 2.6.9-34.ELsmp RIP: 0010:[<ffffffff80305ab4>] <ffffffff80305ab4>{_spin_lock_bh+51} RSP: 0018:000001021428de48 EFLAGS: 00010212 RAX: 0000000000000016 RBX: 0000010217eec2b0 RCX: 0000000000000246 RDX: 00000000000088b1 RSI: 0000000000000246 RDI: ffffffff803d9e60 RBP: 000001021f829c00 R08: 0000000000000000 R09: 0000010217eec2b0 R10: 0000000000000000 R11: 0000000000000000 R12: 0000010217eec2b0 R13: 0000010217eec280 R14: 000001021428de88 R15: 000000000036ee80 FS: 0000002a958a0b00(0000) GS:ffffffff804d7b00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000007fbffffee8 CR3: 0000000000101000 CR4: 00000000000006e0 Process nfsd (pid: 7658, threadinfo 000001021428c000, task 000001021eed5030) Stack: 000001021f829e60 ffffffffa016b90b 0000000000000000 000001021eed5030 ffffffff801333c8 0000000000000000 0000000000000000 0000000000000001 0000000000000000 000001021eed5030 Call Trace:<ffffffffa016b90b>{:sunrpc:svc_recv+819} <ffffffff801333c8>{default_wake_function+0} <ffffffff801333c8>{default_wake_function+0} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de479>{:nfsd:nfsd+381} <ffffffff8013212e>{schedule_tail+55} <ffffffff80110e17>{child_rip+8} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0} Code: 0f 0b f6 f2 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1c 02 00 RIP <ffffffff80305ab4>{_spin_lock_bh+51} RSP <000001021428de48>
Wendy, any ideas? Also what is a "rgmanager node"
The rgmanager is a RHCS user mode daemon that is in charge of service relocation. It exists on each and every node inside the cluster. With test like this, nfsd is brought up-and-down constantly by rgmanager. We we could run into some unknown race conditions. I don't have a clear idea how this could happen at this moment. I could take a closer look when back from New Year break..
Thanks for the explanation.... any help with this would definitely be appreciated...
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.