Bug 129686

Summary: DLM: Assertion failed in lockqueue.c "rsb->res_nodeid == -1"
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: gfsAssignee: David Teigland <teigland>
Status: CLOSED CURRENTRELEASE QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: ccaulfie
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-08-24 16:09:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2004-08-11 20:16:29 UTC
Description of problem:
I was running I/O and once again shot two of the nodes (morph-04 and
morph-05) in the cluster. The I/O and cluster seemed to continue just
fine for awhile until morph-06 tripped this assertion. After that the
filesystems on the remaining 3 nodes were hung.

Aug 11 13:15:06 morph-06 kernel: gfs2 un 370144 ref 1 flg 4 nodeid 0/0
"      11
8c19ef8                                                              
                   Aug 11 13:15:06 morph-06 kernel: gfs2 lu rep 440016
fr 6 0
Aug 11 13:15:06 morph-06 kernel: gfs2 cv 5 42006e "       2        
8c4e5b6"             Aug 11 13:15:06 morph-06 kernel: gfs2 cv 5 3d0347
"       2         8c19ef8"
Aug 11 13:15:06 morph-06 kernel: gfs2 cv 5 49009c "       2        
8c2911d"             Aug 11 13:15:06 morph-06 kernel: gfs1 cv 5 3802ae
"       2         8c14af9"
Aug 11 13:15:06 morph-06 kernel: gfs2 rq 5 350232 "      11        
8c0e5a8"             Aug 11 13:15:06 morph-06 kernel: gfs2 rq 5 440010
"       7         8c0e5a8"
Aug 11 13:15:06 morph-06 kernel: gfs2 send lu 440010 to 3            
                   Aug 11 13:15:06 morph-06 kernel: gfs2 un 350232 ref
1 flg 4 nodeid 0/0 "      11
8c0e5a8                                                              
                   Aug 11 13:15:06 morph-06 kernel: gfs2 rq 5 530320 "
     11         8c0e5a8"
Aug 11 13:15:06 morph-06 kernel: gfs2 rq 5 4d01c4 "       7        
8c0e5a8"             Aug 11 13:15:06 morph-06 kernel: gfs2 lu rep
440010 fr 3 0
Aug 11 13:15:06 morph-06 kernel:                                     
                   Aug 11 13:15:06 morph-06 kernel: DLM:  Assertion
failed on line 714 of file /usr/src/clus
ter/dlm-kernel/src/lockqueue.c                                       
                   Aug 11 13:15:06 morph-06 kernel: DLM:  assertion: 
"rsb->res_nodeid == -1"
yyy 11 13:15:06 morph-06 kernel: DLM:  time = 11285975               
                   Aug 11 13:15:06 morph-06 kernel: dlm: lkb
Aug 11 13:15:06 morph-06 kernel: id 4d01c4                           
                   Aug 11 13:15:06 morph-06 kernel: remid 0
Aug 11 13:15:06 morph-06 kernel: flags 0                             
                   Aug 11 13:15:06 morph-06 kernel: status 0
Aug 11 13:15:06 morph-06 kernel: rqmode 5                            
                   Aug 11 13:15:06 morph-06 kernel: grmode -1
Aug 11 13:15:06 morph-06 kernel: nodeid -1                           
                   Aug 11 13:15:06 morph-06 kernel: lqstate 1
Aug 11 13:15:06 morph-06 kernel: lqflags 0                           
                   Aug 11 13:15:06 morph-06 kernel: dlm: rsb
Aug 11 13:15:06 morph-06 kernel: name "       7         8c0e5a8"     
                   Aug 11 13:15:06 morph-06 kernel: nodeid 0
Aug 11 13:15:06 morph-06 kernel: flags 4                             
                   Aug 11 13:15:06 morph-06 kernel: ref 2
Aug 11 13:15:07 morph-06 kernel:
Aug 11 13:15:07 morph-06 kernel: ------------[ cut here ]------------
Aug 11 13:15:07 morph-06 kernel: kernel BUG at
/usr/src/cluster/dlm-kernel/src/lockqueue.c:714!
Aug 11 13:15:07 morph-06 kernel: invalid operand: 0000 [#1]
Aug 11 13:15:07 morph-06 kernel: Modules linked in: gnbd lock_gulm
lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp
parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd
ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
Aug 11 13:15:07 morph-06 kernel: CPU:    0
Aug 11 13:15:07 morph-06 kernel: EIP:    0060:[<e02fd558>]    Not tainted
Aug 11 13:15:07 morph-06 kernel: EFLAGS: 00010286   (2.6.7)
Aug 11 13:15:07 morph-06 kernel: EIP is at
send_cluster_request+0x278/0x590 [dlm]
Aug 11 13:15:07 morph-06 kernel: eax: 00000001   ebx: d550a894   ecx:
00000000   edx: d7c53cac
Aug 11 13:15:07 morph-06 kernel: esi: 00000003   edi: cae0e067   ebp:
00000000   esp: d7c53ca8
Aug 11 13:15:07 morph-06 kernel: ds: 007b   es: 007b   ss: 0068
Aug 11 13:15:07 morph-06 kernel: Process accordion (pid: 3839,
threadinfo=d7c52000 task=d7cfa8b0)
Aug 11 13:15:07 morph-06 kernel: Stack: e030ac2b 000002ca e030bdbc
e030ad0b 00ac35d7 d804f2e8 db647338 cad03740
Aug 11 13:15:07 morph-06 kernel:        cae0e000 d550a894 00000001
db647338 d550a894 e02fc330 cad03740 cad037a4
Aug 11 13:15:07 morph-06 kernel:        00000003 cad037a4 e02fadb3
db647338 e030aa35 00000005 004d01c4 cad037b5
Aug 11 13:15:07 morph-06 kernel: Call Trace:
Aug 11 13:15:07 morph-06 kernel:  [<e02fc330>] remote_stage+0x20/0x50
[dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e02fadb3>]
dlm_lock_stage1+0x233/0x2b0 [dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e02faaf1>] dlm_lock+0x291/0x320 [dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042bcc0>] lock_ast+0x0/0x10
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042b865>] do_dlm_lock+0xf5/0x1d0
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042bcc0>] lock_ast+0x0/0x10
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042ce2d>] do_range_lock+0x4d/0x60
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042d084>] update_lock+0x14/0xb0
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042d1a8>] add_lock+0x88/0xf0
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042dbcc>]
plock_internal+0x15c/0x360 [lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042e108>]
lm_dlm_plock+0x138/0x1a0 [lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e02ba552>] gfs_lock+0x262/0x340 [gfs]
Aug 11 13:15:07 morph-06 kernel:  [<e02ba2f0>] gfs_lock+0x0/0x340 [gfs]
Aug 11 13:15:07 morph-06 kernel:  [<c016048c>] fcntl_setlk+0x20c/0x270
Aug 11 13:15:07 morph-06 kernel:  [<c014c705>] dentry_open+0xc5/0x1a0
Aug 11 13:15:07 morph-06 kernel:  [<c014c62f>] filp_open+0x4f/0x60
Aug 11 13:15:07 morph-06 kernel:  [<c015c719>]
generic_file_fcntl+0xb9/0x150
Aug 11 13:15:07 morph-06 kernel:  [<c015c910>] sys_fcntl64+0x90/0xa0
Aug 11 13:15:07 morph-06 kernel:  [<c0105cad>] sysenter_past_esp+0x52/0x71
Aug 11 13:15:07 morph-06 kernel:
Aug 11 13:15:07 morph-06 kernel: Code: 0f 0b ca 02 bc bd 30 e0 e9 08
ff ff ff e8 56 eb ff ff e8 41
Aug 11 13:15:07 morph-06 kernel:  <4>CMAN: no HELLO from morph-01,
removing from the cluster


How reproducible:
Didn't try

Comment 1 Corey Marthaler 2004-08-12 15:08:57 UTC
This was reproduced last night on morph-06 while again running I/O and
apparently after morph-01 and morph-03 had paniced due to bz129468

Comment 2 David Teigland 2004-08-12 16:01:45 UTC
This assert should almost certainly be removed.  It should really
have been part of the recent group of related changes -- I'm not sure
how it got through (could have been overlooked in the lengthy diffs
I was merging I guess).


Comment 3 David Teigland 2004-08-13 07:27:31 UTC
the assert causing the problem is removed

Comment 4 Dean Jansa 2004-08-24 16:09:21 UTC
This has not shown its face while attempting to reproduce... 

Comment 5 Kiersten (Kerri) Anderson 2004-11-16 19:06:16 UTC
Updating version to the right level in the defects.  Sorry for the storm.