Bug 129686 - DLM: Assertion failed in lockqueue.c "rsb->res_nodeid == -1"
DLM: Assertion failed in lockqueue.c "rsb->res_nodeid == -1"
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-08-11 16:16 EDT by Corey Marthaler
Modified: 2010-01-11 21:56 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-08-24 12:09:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2004-08-11 16:16:29 EDT
Description of problem:
I was running I/O and once again shot two of the nodes (morph-04 and
morph-05) in the cluster. The I/O and cluster seemed to continue just
fine for awhile until morph-06 tripped this assertion. After that the
filesystems on the remaining 3 nodes were hung.

Aug 11 13:15:06 morph-06 kernel: gfs2 un 370144 ref 1 flg 4 nodeid 0/0
"      11
8c19ef8                                                              
                   Aug 11 13:15:06 morph-06 kernel: gfs2 lu rep 440016
fr 6 0
Aug 11 13:15:06 morph-06 kernel: gfs2 cv 5 42006e "       2        
8c4e5b6"             Aug 11 13:15:06 morph-06 kernel: gfs2 cv 5 3d0347
"       2         8c19ef8"
Aug 11 13:15:06 morph-06 kernel: gfs2 cv 5 49009c "       2        
8c2911d"             Aug 11 13:15:06 morph-06 kernel: gfs1 cv 5 3802ae
"       2         8c14af9"
Aug 11 13:15:06 morph-06 kernel: gfs2 rq 5 350232 "      11        
8c0e5a8"             Aug 11 13:15:06 morph-06 kernel: gfs2 rq 5 440010
"       7         8c0e5a8"
Aug 11 13:15:06 morph-06 kernel: gfs2 send lu 440010 to 3            
                   Aug 11 13:15:06 morph-06 kernel: gfs2 un 350232 ref
1 flg 4 nodeid 0/0 "      11
8c0e5a8                                                              
                   Aug 11 13:15:06 morph-06 kernel: gfs2 rq 5 530320 "
     11         8c0e5a8"
Aug 11 13:15:06 morph-06 kernel: gfs2 rq 5 4d01c4 "       7        
8c0e5a8"             Aug 11 13:15:06 morph-06 kernel: gfs2 lu rep
440010 fr 3 0
Aug 11 13:15:06 morph-06 kernel:                                     
                   Aug 11 13:15:06 morph-06 kernel: DLM:  Assertion
failed on line 714 of file /usr/src/clus
ter/dlm-kernel/src/lockqueue.c                                       
                   Aug 11 13:15:06 morph-06 kernel: DLM:  assertion: 
"rsb->res_nodeid == -1"
yyy 11 13:15:06 morph-06 kernel: DLM:  time = 11285975               
                   Aug 11 13:15:06 morph-06 kernel: dlm: lkb
Aug 11 13:15:06 morph-06 kernel: id 4d01c4                           
                   Aug 11 13:15:06 morph-06 kernel: remid 0
Aug 11 13:15:06 morph-06 kernel: flags 0                             
                   Aug 11 13:15:06 morph-06 kernel: status 0
Aug 11 13:15:06 morph-06 kernel: rqmode 5                            
                   Aug 11 13:15:06 morph-06 kernel: grmode -1
Aug 11 13:15:06 morph-06 kernel: nodeid -1                           
                   Aug 11 13:15:06 morph-06 kernel: lqstate 1
Aug 11 13:15:06 morph-06 kernel: lqflags 0                           
                   Aug 11 13:15:06 morph-06 kernel: dlm: rsb
Aug 11 13:15:06 morph-06 kernel: name "       7         8c0e5a8"     
                   Aug 11 13:15:06 morph-06 kernel: nodeid 0
Aug 11 13:15:06 morph-06 kernel: flags 4                             
                   Aug 11 13:15:06 morph-06 kernel: ref 2
Aug 11 13:15:07 morph-06 kernel:
Aug 11 13:15:07 morph-06 kernel: ------------[ cut here ]------------
Aug 11 13:15:07 morph-06 kernel: kernel BUG at
/usr/src/cluster/dlm-kernel/src/lockqueue.c:714!
Aug 11 13:15:07 morph-06 kernel: invalid operand: 0000 [#1]
Aug 11 13:15:07 morph-06 kernel: Modules linked in: gnbd lock_gulm
lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp
parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd
ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
Aug 11 13:15:07 morph-06 kernel: CPU:    0
Aug 11 13:15:07 morph-06 kernel: EIP:    0060:[<e02fd558>]    Not tainted
Aug 11 13:15:07 morph-06 kernel: EFLAGS: 00010286   (2.6.7)
Aug 11 13:15:07 morph-06 kernel: EIP is at
send_cluster_request+0x278/0x590 [dlm]
Aug 11 13:15:07 morph-06 kernel: eax: 00000001   ebx: d550a894   ecx:
00000000   edx: d7c53cac
Aug 11 13:15:07 morph-06 kernel: esi: 00000003   edi: cae0e067   ebp:
00000000   esp: d7c53ca8
Aug 11 13:15:07 morph-06 kernel: ds: 007b   es: 007b   ss: 0068
Aug 11 13:15:07 morph-06 kernel: Process accordion (pid: 3839,
threadinfo=d7c52000 task=d7cfa8b0)
Aug 11 13:15:07 morph-06 kernel: Stack: e030ac2b 000002ca e030bdbc
e030ad0b 00ac35d7 d804f2e8 db647338 cad03740
Aug 11 13:15:07 morph-06 kernel:        cae0e000 d550a894 00000001
db647338 d550a894 e02fc330 cad03740 cad037a4
Aug 11 13:15:07 morph-06 kernel:        00000003 cad037a4 e02fadb3
db647338 e030aa35 00000005 004d01c4 cad037b5
Aug 11 13:15:07 morph-06 kernel: Call Trace:
Aug 11 13:15:07 morph-06 kernel:  [<e02fc330>] remote_stage+0x20/0x50
[dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e02fadb3>]
dlm_lock_stage1+0x233/0x2b0 [dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e02faaf1>] dlm_lock+0x291/0x320 [dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042bcc0>] lock_ast+0x0/0x10
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042b865>] do_dlm_lock+0xf5/0x1d0
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042bcc0>] lock_ast+0x0/0x10
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042ce2d>] do_range_lock+0x4d/0x60
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042d084>] update_lock+0x14/0xb0
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042d1a8>] add_lock+0x88/0xf0
[lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042dbcc>]
plock_internal+0x15c/0x360 [lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e042e108>]
lm_dlm_plock+0x138/0x1a0 [lock_dlm]
Aug 11 13:15:07 morph-06 kernel:  [<e02ba552>] gfs_lock+0x262/0x340 [gfs]
Aug 11 13:15:07 morph-06 kernel:  [<e02ba2f0>] gfs_lock+0x0/0x340 [gfs]
Aug 11 13:15:07 morph-06 kernel:  [<c016048c>] fcntl_setlk+0x20c/0x270
Aug 11 13:15:07 morph-06 kernel:  [<c014c705>] dentry_open+0xc5/0x1a0
Aug 11 13:15:07 morph-06 kernel:  [<c014c62f>] filp_open+0x4f/0x60
Aug 11 13:15:07 morph-06 kernel:  [<c015c719>]
generic_file_fcntl+0xb9/0x150
Aug 11 13:15:07 morph-06 kernel:  [<c015c910>] sys_fcntl64+0x90/0xa0
Aug 11 13:15:07 morph-06 kernel:  [<c0105cad>] sysenter_past_esp+0x52/0x71
Aug 11 13:15:07 morph-06 kernel:
Aug 11 13:15:07 morph-06 kernel: Code: 0f 0b ca 02 bc bd 30 e0 e9 08
ff ff ff e8 56 eb ff ff e8 41
Aug 11 13:15:07 morph-06 kernel:  <4>CMAN: no HELLO from morph-01,
removing from the cluster


How reproducible:
Didn't try
Comment 1 Corey Marthaler 2004-08-12 11:08:57 EDT
This was reproduced last night on morph-06 while again running I/O and
apparently after morph-01 and morph-03 had paniced due to bz129468
Comment 2 David Teigland 2004-08-12 12:01:45 EDT
This assert should almost certainly be removed.  It should really
have been part of the recent group of related changes -- I'm not sure
how it got through (could have been overlooked in the lengthy diffs
I was merging I guess).
Comment 3 David Teigland 2004-08-13 03:27:31 EDT
the assert causing the problem is removed
Comment 4 Dean Jansa 2004-08-24 12:09:21 EDT
This has not shown its face while attempting to reproduce... 
Comment 5 Kiersten (Kerri) Anderson 2004-11-16 14:06:16 EST
Updating version to the right level in the defects.  Sorry for the storm.

Note You need to log in before you can comment on or make changes to this bug.