175768 – nfsd kernel panic on quorate node after killed rgmanager node finishes recovery

Bug 175768 - nfsd kernel panic on quorate node after killed rgmanager node finishes recovery

Summary: nfsd kernel panic on quorate node after killed rgmanager node finishes recovery

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Red Hat Kernel Manager
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	176344
TreeView+	depends on / blocked

Reported:	2005-12-14 20:13 UTC by Corey Marthaler
Modified:	2010-03-16 19:41 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-03-16 19:41:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Corey Marthaler 2005-12-14 20:13:23 UTC

Description of problem:
I had a three node GFS/rgmanager cluster (taft-02, taft-03, taft-04) exporting
10 GFS/NFS/IP services and 5 EXT/NFS/IP services to an NFS client. I started
some simple I/O from that client to all 15 filesystems and then ran a test
"derringer" which randomly recovers machines and relocates services. As the
killed machine (taft-04) came back into the cluster, taft-03 paniced. 

This issue may be related to bz 175629.

Version-Release number of selected component (if applicable):
Linux taft-03 2.6.9-22.0.1.ELsmp #1 SMP Tue Oct 18 18:39:02 EDT 2005 x86_64
x86_64 x86_64 GNU/Linux
GFS 2.6.9-45.0 (built Nov 28 2005 11:39:41) installed
CMAN 2.6.9-41.0 (built Nov 28 2005 11:26:37) installed
rgmanager-1.9.43-0

derringer output:
Iteration 6 started at Wed Dec 14 07:42:21 CST 2005
For this iteration we're gonna RECOVER MACHINES
Those machines facing the derringer=taft-04
Feeling lucky taft-04? Well do ya? Go'head make my day...
No heartbeat from remote hostRemote command exited with unknown state
        taft-04... DEAD
verify alive
Still not all alive, sleeping another 10 seconds
Still not all alive, sleeping another 10 seconds
Still not all alive, sleeping another 10 seconds
Still not all alive, sleeping another 10 seconds
Still not all alive, sleeping another 10 seconds
Still not all alive, sleeping another 10 seconds
Still not all alive, sleeping another 10 seconds
Still not all alive, sleeping another 10 seconds
All killed nodes are back up, making sure they're qarshable...
Still not all qarshable, sleeping another 10 seconds
Still not all qarshable, sleeping another 10 seconds
Still not all qarshable, sleeping another 10 seconds
Still not all qarshable, sleeping another 10 seconds
Still not all qarshable, sleeping another 10 seconds
Still not all qarshable, sleeping another 10 seconds
Still not all qarshable, sleeping another 10 seconds
Still not all qarshable, sleeping another 10 seconds
start cluster
verify recovery
checking Fence recovery...
checking DLM recovery...
checking GFS recovery...
checking the rgmanager status and state of duel_nodes...
Checking status of GFS0 service on taft-04
IN get_rg_service_state, SERVICE=GFS0, NODE=taft-04

[panic on taft-03]

CMAN: removing node taft-04 from the cluster : Missed too many heartbeats
GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft8.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft7.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft8.2: jid=1: Busy
GFS: fsid=TAFT234_CLUSTER:taft7.2: jid=1: Busy
GFS: fsid=TAFT234_CLUSTER:taft6.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft5.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft4.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft3.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft2.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft1.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft0.2: jid=1: Trying to acquire journal lock...
GFS: fsid=TAFT234_CLUSTER:taft6.2: jid=1: Busy
GFS: fsid=TAFT234_CLUSTER:taft5.2: jid=1: Busy
GFS: fsid=TAFT234_CLUSTER:taft4.2: jid=1: Busy
GFS: fsid=TAFT234_CLUSTER:taft3.2: jid=1: Busy
GFS: fsid=TAFT234_CLUSTER:taft1.2: jid=1: Busy
GFS: fsid=TAFT234_CLUSTER:taft2.2: jid=1: Busy
GFS: fsid=TAFT234_CLUSTER:taft0.2: jid=1: Busy
GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Looking at journal...
GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Acquiring the transaction lock...
GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Replaying journal...
GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Replayed 0 of 5 blocks
GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: replays = 0, skips = 2, sames = 3
GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Journal replayed in 1s
GFS: fsid=TAFT234_CLUSTER:taft9.2: jid=1: Done
eip: ffffffffa01947<a4a
eipe:i p:f ffffffffffffffa0f1a904179a47aa
a
---------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at spinlock:118
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: radeon nfsd exportfs lockd parport_pc lp parport autofs4
i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6
sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd ehci_hcd hw_random
e1000 floppy qla2300 qla2xxx sg dm_snapshot dm_zerodm_mirror ext3 jbd dm_mod
lpfc scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod
Pid: 5861, comm: nfsd Not tainted 2.6.9-22.0.1.ELsmp
RIP: 0010:[<ffffffff80303ed3>] <ffffffff80303ed3>{_spin_lock_bh+51}
RSP: 0018:0000010211c7be48  EFLAGS: 00010212
RAX: 0000000000000016 RBX: 000001021dd8e9b0 RCX: 0000000000020000
RDX: 00000000000094c9 RSI: 0000000000000246 RDI: ffffffff803d78e0
RBP: 000001000e079400 R08: 00000000fffffffe R09: 000001021dd8e9b0
R10: 0000000000000000 R11: 0000000000000000 R12: 000001021dd8e9b0
R13: 000001021dd8e980 R14: 0000010211c7be88 R15: 000000000036ee80
FS:  0000002a9589fb00(0000) GS:ffffffff804d3080(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000314a82a008 CR3: 0000000000101000 CR4: 00000000000006e0
Process nfsd (pid: 5861, threadinfo 0000010211c7a000, task 00000102115d3030)
Stack: 000001000e079660 ffffffffa01947aa 0000000000000000 00000102115d3030
       ffffffff80132e59 0000000000000000 0000000000000000 0000000000000246
       0000000000000000 00000102115d3030
Call Trace:<ffffffffa01947aa>{:sunrpc:svc_recv+819}
<ffffffff80132e59>{default_wake_function+0}
       <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8}
       <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffff80110c9b>{child_rip+0}

Code: 0f 0b db d5 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 18 02 00
RIP <ffffffff80303ed3>{_spin_lock_bh+51} RSP <0000010211c7be48>
 --<0-->-Ke-r-n--el- -p a[ncuict  -h ernoet  ]s y--nc-i--ng-:- --Oo [pspl
eaeseip b:i fteff hffefrfe f]a 0-1-94--7a--a-
-                                            -
Kernel BUG at spinlock:118
invalid operand: 0000 [2] SMP
CPU 1
Modules linked in: radeon nfsd exportfs lockd parport_pc lp parport autofs4
i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6
sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd ehci_hcd hw_random
e1000 floppy qla2300 qla2xxx sg dm_snapshot dm_zerodm_mirror ext3 jbd dm_mod
lpfc scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod
Pid: 5854, comm: nfsd Not tainted 2.6.9-22.0.1.ELsmp
RIP: 0010:[<ffffffff80303ed3>] <ffffffff80303ed3>{_spin_lock_bh+51}
RSP: 0018:000001021183be48  EFLAGS: 00010212
RAX: 0000000000000016 RBX: 000001021dd8e9b0 RCX: 0000000000000246
RDX: 0000000000009525 RSI: 0000000000000246 RDI: ffffffff803d78e0
RBP: 0000010037d10800 R08: 000000000000000d R09: 000001021dd8e9b0
R10: 0000000000000000 R11: 0000000000000000 R12: 000001021dd8e9b0
R13: 000001021dd8e980 R14: 000001021183be88 R15: 000000000036ee80
FS:  0000002a9589fb00(0000) GS:ffffffff804d3100(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000314a68ed20 CR3: 00000000dff88000 CR4: 00000000000006e0
Process nfsd (pid: 5854, threadinfo 000001021183a000, task 0000010213bc6030)
Stack: 0000010037d10a60 ffffffffa01947aa 0000000000000000 0000010213bc6030
       ffffffff80132e59 0000000000000000 0000000000000000 0000000000000246
       0000000000000000 0000010213bc6030
Call Trace:<ffffffffa01947aa>{:sunrpc:svc_recv+819}
<ffffffff80132e59>{default_wake_function+0}
       <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8}
       <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffff80110c9b>{child_rip+0}

Code: 0f 0b db d5 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 18 02 00
RIP <ffffffff80303ed3>{_spin_lock_bh+51} RSP <000001021183be48>
Badness in do_unblank_screen at drivers/char/vt.c:2876

Call Trace:<ffffffff802319d2>{do_unblank_screen+61}
<ffffffff80122dc8>{bust_spinlocks+28}
       <ffffffff80111834>{oops_end+18} <ffffffff80111961>{die+54}
       <ffffffff80111d24>{do_invalid_op+145} <ffffffff80303ed3>{_spin_lock_bh+51}
       <ffffffff801371ba>{release_console_sem+369} <ffffffff801373e8>{vprintk+498}
       <ffffffff80137492>{printk+141} <ffffffff80110aed>{error_exit+0}
       <ffffffff80303ed3>{_spin_lock_bh+51} <ffffffff80303ed3>{_spin_lock_bh+51}
       <ffffffffa01947aa>{:sunrpc:svc_recv+819}
<ffffffff80132e59>{default_wake_function+0}
       <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8}
       <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffff80110c9b>{child_rip+0}
 ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at spinlock:118
invalid operand: 0000 [3] SMP
CPU 2
Modules linked in: radeon nfsd exportfs lockd parport_pc lp parport autofs4
i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6
sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd ehci_hcd hw_random
e1000 floppy qla2300 qla2xxx sg dm_snapshot dm_zerodm_mirror ext3 jbd dm_mod
lpfc scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod
Pid: 5860, comm: nfsd Not tainted 2.6.9-22.0.1.ELsmp
RIP: 0010:[<ffffffff80303ed3>] <ffffffff80303ed3>{_spin_lock_bh+51}
RSP: 0018:0000010211bcde48  EFLAGS: 00010212
RAX: 0000000000000016 RBX: 000001021dd8e9b0 RCX: 0000000000020000
RDX: 0000000000009d67 RSI: 0000000000000203 RDI: ffffffff803d78e0
RBP: 000001000e07ac00 R08: 00000000fffffffe R09: 000001021dd8e9b0
R10: 0000000000000000 R11: 0000000000000000 R12: 000001021dd8e9b0
R13: 000001021dd8e980 R14: 0000010211bcde88 R15: 000000000036ee80
FS:  0000002a9589fb00(0000) GS:ffffffff804d3180(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a95771958 CR3: 00000000dffbe000 CR4: 00000000000006e0
Process nfsd (pid: 5860, threadinfo 0000010211bcc000, task 00000102115d37f0)
Stack: 000001000e07ae60 ffffffffa01947aa 0000000000000000 00000102115d37f0
       ffffffff80132e59 0000000000000000 0000000000000000 0000000000000246
       0000000000000000 00000102115d37f0
Call Trace:<ffffffffa01947aa>{:sunrpc:svc_recv+819}
<ffffffff80132e59>{default_wake_function+0}
       <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8}
       <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffff80110c9b>{child_rip+0}

Code: 0f 0b db d5 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 18 02 00
RIP <ffffffff80303ed3>{_spin_lock_bh+51} RSP <0000010211bcde48>
 ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at spinlock:118
invalid operand: 0000 [4] SMP
CPU 3
Modules linked in: radeon nfsd exportfs lockd parport_pc lp parport autofs4
i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6
sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd ehci_hcd hw_random
e1000 floppy qla2300 qla2xxx sg dm_snapshot dm_zerodm_mirror ext3 jbd dm_mod
lpfc scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod
Pid: 5859, comm: nfsd Not tainted 2.6.9-22.0.1.ELsmp
RIP: 0010:[<ffffffff80303ed3>] <ffffffff80303ed3>{_spin_lock_bh+51}
RSP: 0018:00000102119dfe48  EFLAGS: 00010212
RAX: 0000000000000016 RBX: 000001021dd8e9b0 RCX: 0000000000020000
RDX: 0000000000009525 RSI: 0000000000000203 RDI: ffffffff803d78e0
RBP: 000001000e07a800 R08: 00000000fffffffe R09: 000001021dd8e9b0
R10: 0000000000000000 R11: 0000000000000000 R12: 000001021dd8e9b0
R13: 000001021dd8e980 R14: 00000102119dfe88 R15: 000000000036ee80
FS:  0000002a9589fb00(0000) GS:ffffffff804d3200(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000007fbfffefa0 CR3: 0000000037e34000 CR4: 00000000000006e0
Process nfsd (pid: 5859, threadinfo 00000102119de000, task 000001021651e7f0)
Stack: 000001000e07aa60 ffffffffa01947aa 0000000000000000 000001021651e7f0
       ffffffff80132e59 0000000000000000 0000000000000000 0000000000000246
       0000000000000000 000001021651e7f0
Call Trace:<ffffffffa01947aa>{:sunrpc:svc_recv+819}
<ffffffff80132e59>{default_wake_function+0}
       <ffffffff80132e59>{default_wake_function+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffffa02ee479>{:nfsd:nfsd+381} <ffffffff80110ca3>{child_rip+8}
       <ffffffffa02ee2fc>{:nfsd:nfsd+0} <ffffffffa02ee2fc>{:nfsd:nfsd+0}
       <ffffffff80110c9b>{child_rip+0}

Code: 0f 0b db d5 31 80 ff ff ff ff 76 00 f0 fe 0b 0f 88 18 02 00
RIP <ffffffff80303ed3>{_spin_lock_bh+51} RSP <00000102119dfe48>

Comment 1 Corey Marthaler 2005-12-14 20:18:39 UTC

[root@taft-02 ~]# clustat
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  taft-02                                  Online, Local, rgmanager
  taft-03                                  Offline
  taft-04                                  Online, rgmanager

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  GFS0                 taft-02                        started
  GFS1                 taft-02                        started
  GFS2                 taft-02                        started
  GFS3                 taft-02                        started
  GFS4                 taft-02                        started
  GFS5                 taft-02                        started
  GFS6                 taft-02                        started
  GFS7                 taft-02                        started
  GFS8                 taft-02                        started
  GFS9                 taft-02                        started
  EXT10                taft-02                        started
  EXT11                taft-02                        started
  EXT12                taft-02                        started
  EXT13                taft-02                        started
  EXT14                taft-02                        started
  logman               taft-02                        started

Comment 2 Corey Marthaler 2006-05-05 19:00:06 UTC

Hit this again today without doing any recovery testing. 
Just had 4 GFS filesystems being NFS serviced to 5 clients with rgmanger. Once
I/O started taft-01 paniced.

eip: fffff<f4f>efaip0:16 bff90ffb
fefaip0:16 bff90ffb              ff
fff--a-01--6b--9-0b--
 [cut here ] --------- [please bite here ] ---------
Kernel BUG at spinlock:118
invalid operand: 0000 [1] SMP
CPU 1
Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gnbd(U)
lock_nolock(U) gfs(U) lock_harness(U)d
Pid: 7657, comm: nfsd Not tainted 2.6.9-34.ELsmp
RIP: 0010:[<ffffffff80305ab4>] <ffffffff80305ab4>{_spin_lock_bh+51}
RSP: 0018:0000010218eb9e48  EFLAGS: 00010212
RAX: 0000000000000016 RBX: 0000010217eec2b0 RCX: 0000000000020000
RDX: 0000000000008125 RSI: 0000000000000246 RDI: ffffffff803d9e60
RBP: 0000010037c71400 R08: 00000000fffffffe R09: 0000010217eec2b0
R10: 0000000000000000 R11: 0000000000000000 R12: 0000010217eec2b0
R13: 0000010217eec280 R14: 0000010218eb9e88 R15: 000000000036ee80
FS:  0000002a958a0b00(0000) GS:ffffffff804d7b80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a9946fad8 CR3: 00000001fffb8000 CR4: 00000000000006e0
Process nfsd (pid: 7657, threadinfo 0000010218eb8000, task 0000010217c8c7f0)
Stack: 0000010037c71660 ffffffffa016b90b 0000000000000000 0000010217c8c7f0
       ffffffff801333c8 0000000000000000 0000000000000000 0000000000000001
       0000000000000000 0000010217c8c7f0
Call Trace:<ffffffffa016b90b>{:sunrpc:svc_recv+819}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffff801333c8>{default_wake_function+0} <ffffffffa02de2fc>{:nfsd:nfsd+0}
       <ffffffffa02de479>{:nfsd:nfsd+381} <ffffffff8013212e>{schedule_tail+55}
       <ffffffff80110e17>{child_rip+8} <ffffffffa02de2fc>{ei:np:fs
df:ffnffsffd+ff0}a01<46b> 90
                                                                               
               b
       <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0}


Code: 0f 0b f6 f2 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1c 02 00
RIP <ffffffff80305ab4>{_spin_lock_bh+51} RSP <0000010218eb9e48>
 --<0-->-Ke-rn--el-- p- an[cicu t- h nerote  s] yn--ci--ng--: --Oo- p[sp
ea se bite here ] ---------                                             l
Kernel BUG at spinlock:118
invalid operand: 0000 [2] SMP
CPU 2
Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gnbd(U)
lock_nolock(U) gfs(U) lock_harness(U)d
Pid: 7660, comm: nfsd Not tainted 2.6.9-34.ELsmp
RIP: 0010:[<ffffffff80305ab4>] <ffffffff80305ab4>{_spin_lock_bh+51}
RSP: 0018:00000102197c1e48  EFLAGS: 00010212
RAX: 0000000000000016 RBX: 0000010217eec2b0 RCX: 0000000000020000
RDX: 0000000000008181 RSI: 0000000000000203 RDI: ffffffff803d9e60
RBP: 00000100dffb8c00 R08: 00000000fffffffe R09: 0000010217eec2b0
R10: 0000000000000000 R11: 0000000000000000 R12: 0000010217eec2b0
R13: 0000010217eec280 R14: 00000102197c1e88 R15: 000000000036ee80
FS:  0000002a958a0b00(0000) GS:ffffffff804d7c00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a9556c000 CR3: 00000000dffae000 CR4: 00000000000006e0
Process nfsd (pid: 7660, threadinfo 00000102197c0000, task 00000102155607f0)
Stack: 00000100dffb8e60 ffffffffa016b90b 0000000000000000 00000102155607f0
       ffffffff801333c8 0000000000000000 0000000000000000 0000000000000001
       0000000000000000 00000102155607f0
Call Trace:<ffffffffa016b90b>{:sunrpc:svc_recv+819}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffff801333c8>{default_wake_function+0} <ffffffffa02de2fc>{:nfsd:nfsd+0}
       <ffffffffa02de479>{:nfsd:nfsd+381} <ffffffff8013212e>{schedule_tail+55}
       <ffffffff80110e17>{child_rip+8} <ffffffffa02de2fc>{:nfsd:nfsd+0}
       <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0}


Code: 0f 0b f6 f2 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1c 02 00
RIP <ffffffff80305ab4>{_spin_lock_bh+51} RSP <00000102197c1e48>
Badness in do_unblank_screen at drivers/char/vt.c:2876

Call Trace:<ffffffff80232d46>{do_unblank_screen+61}
<ffffffff801231cc>{bust_spinlocks+28}
       <ffffffff801119a8>{oops_end+18} <ffffffff80111ad5>{die+54}
       <ffffffff80111e98>{do_invalid_op+145} <ffffffff80305ab4>{_spin_lock_bh+51}
       <ffffffff80137b55>{vprintk+515} <ffffffff80137bee>{printk+141}
       <ffffffff80110c61>{error_exit+0} <ffffffff80305ab4>{_spin_lock_bh+51}
       <ffffffff80305ab4>{_spin_lock_bh+51} <ffffffffa016b90b>{:sunrpc:svc_recv+819}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de479>{:nfsd:nfsd+381}
       <ffffffff8013212e>{schedule_tail+55} <ffffffff80110e17>{child_rip+8}
       <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffffa02de2fc>{:nfsd:nfsd+0}
       <ffffffff80110e0f>{child_rip+0}
 ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at spinlock:118
invalid operand: 0000 [3] SMP
CPU 3
Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gnbd(U)
lock_nolock(U) gfs(U) lock_harness(U)d
Pid: 7654, comm: nfsd Not tainted 2.6.9-34.ELsmp
RIP: 0010:[<ffffffff80305ab4>] <ffffffff80305ab4>{_spin_lock_bh+51}
RSP: 0018:0000010216a17e48  EFLAGS: 00010212
RAX: 0000000000000016 RBX: 0000010217eec2b0 RCX: 0000000000000246
RDX: 0000000000008181 RSI: 0000000000000246 RDI: ffffffff803d9e60
RBP: 00000101fff27c00 R08: 000000000000000d R09: 0000010217eec2b0
R10: 0000000000000000 R11: 0000000000000000 R12: 0000010217eec2b0
R13: 0000010217eec280 R14: 0000010216a17e88 R15: 000000000036ee80
FS:  0000002a958a0b00(0000) GS:ffffffff804d7c80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a95bbcb40 CR3: 0000000037e24000 CR4: 00000000000006e0
Process nfsd (pid: 7654, threadinfo 0000010216a16000, task 00000102188b1030)
Stack: 00000101fff27e60 ffffffffa016b90b 0000000000000000 00000102188b1030
       ffffffff801333c8 0000000000000000 0000000000000000 000001021f8d4c00
       0000000000000000 00000102188b1030
Call Trace:<ffffffffa016b90b>{:sunrpc:svc_recv+819}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffff801333c8>{default_wake_function+0} <ffffffffa02de2fc>{:nfsd:nfsd+0}
       <ffffffffa02de479>{:nfsd:nfsd+381} <ffffffff8013212e>{schedule_tail+55}
       <ffffffff80110e17>{child_rip+8} <ffffffffa02de2fc>{:nfsd:nfsd+0}
       <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0}


Code: 0f 0b f6 f2 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1c 02 00
RIP <ffffffff80305ab4>{_spin_lock_bh+51} RSP <0000010216a17e48>
 ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at spinlock:118
invalid operand: 0000 [4] SMP
CPU 0
Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gnbd(U)
lock_nolock(U) gfs(U) lock_harness(U)d
Pid: 7658, comm: nfsd Not tainted 2.6.9-34.ELsmp
RIP: 0010:[<ffffffff80305ab4>] <ffffffff80305ab4>{_spin_lock_bh+51}
RSP: 0018:000001021428de48  EFLAGS: 00010212
RAX: 0000000000000016 RBX: 0000010217eec2b0 RCX: 0000000000000246
RDX: 00000000000088b1 RSI: 0000000000000246 RDI: ffffffff803d9e60
RBP: 000001021f829c00 R08: 0000000000000000 R09: 0000010217eec2b0
R10: 0000000000000000 R11: 0000000000000000 R12: 0000010217eec2b0
R13: 0000010217eec280 R14: 000001021428de88 R15: 000000000036ee80
FS:  0000002a958a0b00(0000) GS:ffffffff804d7b00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000007fbffffee8 CR3: 0000000000101000 CR4: 00000000000006e0
Process nfsd (pid: 7658, threadinfo 000001021428c000, task 000001021eed5030)
Stack: 000001021f829e60 ffffffffa016b90b 0000000000000000 000001021eed5030
       ffffffff801333c8 0000000000000000 0000000000000000 0000000000000001
       0000000000000000 000001021eed5030
Call Trace:<ffffffffa016b90b>{:sunrpc:svc_recv+819}
<ffffffff801333c8>{default_wake_function+0}
       <ffffffff801333c8>{default_wake_function+0} <ffffffffa02de2fc>{:nfsd:nfsd+0}
       <ffffffffa02de479>{:nfsd:nfsd+381} <ffffffff8013212e>{schedule_tail+55}
       <ffffffff80110e17>{child_rip+8} <ffffffffa02de2fc>{:nfsd:nfsd+0}
       <ffffffffa02de2fc>{:nfsd:nfsd+0} <ffffffff80110e0f>{child_rip+0}


Code: 0f 0b f6 f2 31 80 ff ff ff ff 76 00 f0 ff 0b 0f 88 1c 02 00
RIP <ffffffff80305ab4>{_spin_lock_bh+51} RSP <000001021428de48>

Comment 3 Steve Dickson 2006-12-22 19:03:42 UTC

Wendy,

any ideas?

Also what is a "rgmanager node"

Comment 4 Wendy Cheng 2006-12-22 19:20:15 UTC

The rgmanager is a RHCS user mode daemon that is in charge of service 
relocation. It exists on each and every node inside the cluster. With 
test like this, nfsd is brought up-and-down constantly by rgmanager. We
we could run into some unknown race conditions.

I don't have a clear idea how this could happen at this moment. I could
take a closer look when back from New Year break..

Comment 5 Steve Dickson 2006-12-23 01:21:48 UTC

Thanks for the explanation.... any help with this would definitely
be appreciated...

Comment 9 RHEL Program Management 2007-09-07 19:46:27 UTC

This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Note You need to log in before you can comment on or make changes to this bug.