Bug 176872

Summary: DLM issuse causes too many transition restarts and cluster to die
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: cmanAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED WORKSFORME QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: ccaulfie, cluster-maint, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-06 19:50:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2006-01-03 21:50:13 UTC
Description of problem:
I had I/O running from an NFS client (joynter) to many GFS and EXT filesystems
which were being exported by my four node taft cluster (taft-01, taft-02,
taft-03, taft-04). This was while relocation of those services was continuously
taking place. The I/O was simple data (echoing into files) and metadata (moving
those files around). This load ran for 8 days straight during the holiday break
until this issue occured. All the machines except taft-03 ended up asserting or
panicing.

Version-Release number of selected component (if applicable):
[root@taft-02 ~]# uname -ar
Linux taft-02 2.6.9-25.ELsmp #1 SMP Mon Dec 12 17:29:54 EST 2005 x86_64 x86_64
x86_64 GNU/Linux
[root@taft-02 ~]# rpm -q dlm
dlm-1.0.0-5



TAFT-03:
dlm: taft0: process_lockqueue_reply id 529801fd state 0
dlm: taft0: process_lockqueue_reply id 54940091 state 0
dlm: taft0: process_lockqueue_reply id 571603c3 state 0
dlm: taft8: process_lockqueue_reply id 50bc0094 state 0
dlm: taft0: process_lockqueue_reply id 5a0d0025 state 0
dlm: taft2: process_lockqueue_reply id 57020370 state 0
dlm: taft6: process_lockqueue_reply id 5cce016d state 0
dlm: taft4: process_lockqueue_reply id 5cd602e8 state 0
dlm: taft0: process_lockqueue_reply id 60360318 state 0
dlm: taft1: process_lockqueue_reply id 5b270341 state 0
dlm: taft0: process_lockqueue_reply id 64c503ab state 0
dlm: taft5: process_lockqueue_reply id 5d370096 state 0
dlm: taft6: process_lockqueue_reply id 65270337 state 0
dlm: taft7: process_lockqueue_reply id 5dbc016e state 0
dlm: taft0: process_lockqueue_reply id 6de60029 state 0
dlm: taft9: process_lockqueue_reply id 69c6002d state 0
dlm: taft1: process_lockqueue_reply id 696f00bc state 0
dlm: taft0: process_lockqueue_reply id 781703f0 state 0
dlm: taft1: process_lockqueue_reply id 6ff10395 state 0
dlm: taft7: process_lockqueue_reply id 7aa203f1 state 0
dlm: taft3: process_lockqueue_reply id 808802c6 state 0
dlm: taft0: process_lockqueue_reply id 8f8e0241 state 0
dlm: taft4: process_lockqueue_reply id 88670014 state 0
dlm: taft0: process_lockqueue_reply id 9284004f state 0
dlm: taft4: process_lockqueue_reply id 8ded0168 state 0
dlm: taft0: process_lockqueue_reply id a10b03c5 state 0
dlm: taft7: process_lockqueue_reply id 9af70275 state 0
dlm: taft3: process_lockqueue_reply id a15300bc state 0
dlm: taft3: process_lockqueue_reply id a6c400ab state 0
dlm: taft7: process_lockqueue_reply id a5b10090 state 0
dlm: taft1: process_lockqueue_reply id b3060376 state 0
dlm: taft8: process_lockqueue_reply id b85103b8 state 0
dlm: taft7: process_lockqueue_reply id c3530137 state 0
dlm: taft1: process_lockqueue_reply id c511028e state 0
dlm: taft3: process_lockqueue_reply id caa80378 state 0
dlm: taft6: process_lockqueue_reply id dcf8020b state 0
dlm: taft0: process_lockqueue_reply id ef63037e state 0
CMAN: removing node taft-04 from the cluster : Missed too many heartbeats
CMAN: removing node taft-01 from the cluster : Inconsistent cluster view
CMAN: removing node taft-02 from the cluster : Inconsistent cluster view
CMAN: quorum lost, blocking activity




TAFT-04:
dlm: taft2: process_lockqueue_reply id 548c0319 state 0
dlm: taft0: process_lockqueue_reply id 7ae8034b state 0
dlm: taft6: process_lockqueue_reply id 60b903dc state 0
dlm: taft6: process_lockqueue_reply id 648c0283 state 0
dlm: taft8: process_lockqueue_reply id 626b029a state 0
dlm: taft8: process_lockqueue_reply id 7cfb009d state 0
dlm: taft2: process_lockqueue_reply id 7b7f01b4 state 0
dlm: taft1: process_lockqueue_reply id 93ff011a state 0
dlm: taft1: process_lockqueue_reply id 9a42009c state 0
dlm: taft4: process_lockqueue_reply id 9e0a016a state 0
dlm: taft2: process_lockqueue_reply id 9df50102 state 0
dlm: taft0: process_lockqueue_reply id e8e90397 state 0
dlm: taft3: process_lockqueue_reply id bd71007d state 0
dlm: taft2: process_lockqueue_reply id bbf10154 state 0
dlm: taft5: process_lockqueue_reply id c32102c4 state 0
dlm: taft2: process_lockqueue_reply id c35f0070 state 0
dlm: taft7: process_lockqueue_reply id b9ce0146 state 0
dlm: taft3: process_lockqueue_reply id cda90196 state 0
dlm: taft3: process_lockqueue_reply id dba40141 state 0
dlm: taft3: process_lockqueue_reply id de70028c state 0
dlm: taft2: process_lockqueue_reply id e1cd02f2 state 0
dlm: taft1: process_lockqueue_reply id f3a501d0 state 0
295) req reply einval 33e039f fr 4 r 4 usrm::rg="EXT"
Magma (10295) req reply einval 33e039f fr 4 r 4 usrm::rg="EXT"
taft3 (15892) req reply einval fe1b0265 fr 3 r 3        2
taft1 (4087) req reply einval 937016b fr 3 r 3        2
taft5 (4167) req reply einval 25c0230 fr 4 r 4        2
taft8 send einval to 3
taft1 (15917) req reply einval a300235 fr 2 r 2        2
taft5 send einval to 4
Magma (10295) req reply einval 30e0331 fr 4 r 4 usrm::rg="GFS"
Magma (10295) req reply einval 30b032e fr 4 r 4 usrm::rg="EXT"
Magma (10295) req reply einval 30b032e fr 4 r 4 usrm::rg="EXT"
taft2 send einval to 3
taft5 (4166) req reply einval 8303cd fr 3 r 3        2
taft2 send einval to 2
taft5 (15911) req reply einval 2d60267 fr 2 r 2        2
taft5 (15911) req reply einval 2d60267 fr 2 r 2        2
taft5 (15911) req reply einval 2d60267 fr 2 r 2        2
Magma (10295) req reply einval 30201b9 fr 4 r 4 usrm::vf
Magma (10295) req reply einval 32f023a fr 4 r 4 usrm::vf
k 0
15908 en punlock 7,138bf2
15908 remove 7,138bf2
15893 en plock 7,3e633
15908 ex punlock 0
15893 req 7,3e633 ex 0-7fffffffffffffff lkf 2000 wait 1
15892 en plock 7,2ef41
15890 ex plock 0
15890 en punlock 7,177237
15890 remove 7,177237
15890 ex punlock 0
15893 ex plock 0
15893 en punlock 7,3e633
15893 remove 7,3e633
15893 ex punlock 0
15892 req 7,2ef41 ex 0-7fffffffffffffff lkf 2000 wait 1
15892 ex plock 0
15892 en punlock 7,2ef41
15892 remove 7,2ef41
15892 ex punlock 0
15890 en plock 7,1771c2
15890 req 7,1771c2 ex 0-7fffffffffffffff lkf 2000 wait 1
15890 ex plock 0
15890 en punlock 7,1771c2
15890 remove 7,1771c2
15890 ex punlock 0
15892 en plock 7,2ecb7
15897 en plock 7,fa75
15897 req 7,fa75 ex 0-7fffffffffffffff lkf 2000 wait 1
15897 ex plock 0
15892 req 7,2ecb7 ex 0-7fffffffffffffff lkf 2000 wait 1
15915 en plock 7,177143
15897 en punlock 7,fa75
15897 remove 7,fa75
15892 ex plock 0
15897 ex punlock 0
15915 req 7,177143 ex 0-7fffffffffffffff lkf 2000 wait 1
15914 en plock 7,177256
15914 req 7,177256 ex 0-7fffffffffffffff lkf 2000 wait 1
15909 en plock 7,635
15909 req 7,635 ex 0-7fffffffffffffff lkf 2000 wait 1
15915 ex plock 0
15909 ex plock 0
15915 en punlock 7,177143
15915 remove 7,177143
15915 ex punlock 0
15914 ex plock 0
15914 en punlock 7,177256
15914 remove 7,177256
15914 ex punlock 0
15897 en plock 7,fbf6
15897 req 7,fbf6 ex 0-7fffffffffffffff lkf 2000 wait 1
15914 en plock 7,130
15914 req 7,130 ex 0-7fffffffffffffff lkf 2000 wait 1
15914 ex plock 0
15882 en plock 7,a63be
15882 req 7,a63be ex 0-7fffffffffffffff lkf 2000 wait 1
15915 en plock 7,16792b
15915 req 7,16792b ex 0-7fffffffffffffff lkf 2000 wait 1
15915 ex plock 0
15882 ex plock 0
15882 en punlock 7,a63be
15882 remove 7,a63be
15914 en punlock 7,130
15914 remove 7,130
15882 ex punlock 0
15914 ex punlock 0
15897 ex plock 0
15909 en punlock 7,635
15897 en punlock 7,fbf6
15909 remove 7,635
15897 remove 7,fbf6
15909 ex punlock 0
15897 ex punlock 0
15882 en plock 7,1594d1
15914 en plock 7,1679b1
15914 req 7,1679b1 ex 0-7fffffffffffffff lkf 2000 wait 1
15882 req 7,1594d1 ex 0-7fffffffffffffff lkf 2000 wait 1
15882 ex plock 0
15882 en punlock 7,1594d1
15914 ex plock 0
15914 en punlock 7,1679b1
15882 remove 7,1594d1
15914 remove 7,1679b1
15882 ex punlock 0
15914 ex punlock 0
15897 en plock 7,6d4fd
15897 req 7,6d4fd ex 0-7fffffffffffffff lkf 2000 wait 1
15897 ex plock 0
15882 en plock 7,8ea6
15915 en punlock 7,16792b
15915 remove 7,16792b
15882 req 7,8ea6 ex 0-7fffffffffffffff lkf 2000 wait 1
15915 ex punlock 0
15909 en plock 7,138b37
15909 req 7,138b37 ex 0-7fffffffffffffff lkf 2000 wait 1
15882 ex plock 0
15909 ex plock 0
15882 en punlock 7,8ea6
15882 remove 7,8ea6
15882 ex punlock 0
15897 en punlock 7,6d4fd
15897 remove 7,6d4fd
15897 ex punlock 0
15916 en plock 7,6d392
15916 req 7,6d392 ex 0-7fffffffffffffff lkf 2000 wait 1
15916 ex plock 0
15916 en punlock 7,6d392
15916 remove 7,6d392
15916 ex punlock 0
15897 en plock 7,ffba
15897 req 7,ffba ex 0-7fffffffffffffff lkf 2000 wait 1
15912 en plock 7,109ea7
15897 ex plock 0
15910 en plock 7,109f56
15912 req 7,109ea7 ex 0-7fffffffffffffff lkf 2000 wait 1
15910 req 7,109f56 ex 0-7fffffffffffffff lkf 2000 wait 1
15910 ex plock 0
15910 en punlock 7,109f56
15910 remove 7,109f56
15910 ex punlock 0
15912 ex plock 0
15912 en punlock 7,109ea7
15912 remove 7,109ea7
15912 ex punlock 0
15897 en punlock 7,ffba
15897 remove 7,ffba
15897 ex punlock 0
15897 en plock 7,fb02
15897 req 7,fb02 ex 0-7fffffffffffffff lkf 2000 wait 1
15897 ex plock 0
15897 en punlock 7,fb02
15897 remove 7,fb02
15897 ex punlock 0
15912 en plock 7,1a8
15912 req 7,1a8 ex 0-7fffffffffffffff lkf 2000 wait 1
15915 en plock 7,177135
15912 ex plock 0
15912 en punlock 7,1a8
15912 remove 7,1a8
15915 req 7,177135 ex 0-7fffffffffffffff lkf 2000 wait 1
15912 ex punlock 0
15914 en plock 7,1679f1
15914 req 7,1679f1 ex 0-7fffffffffffffff lkf 2000 wait 1
15914 ex plock 0
15914 en punlock 7,1679f1
15914 remove 7,1679f1
15914 ex punlock 0
15915 ex plock 0
15910 en plock 7,109fb4
15910 req 7,109fb4 ex 0-7fffffffffffffff lkf 2000 wait 1
15910 ex plock 0
15771 ex plock 0

lock_dlm:  Assertion failed on line 428 of file
/usr/src/build/664408-x86_64/BUILD/gfs-kernel-2.6.9-46/smp/src/dlm/lock.c
lock_dlm:  assertion:  "!error"
lock_dlm:  time = 5134977188
taft3: num=2,7cc38 err=-22 cur=3 req=5 lkf=44

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at lock:428
invalid operand: 0000 [1] SMP
CPU 3
Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gfs(U)
lock_harness(U) dlm(U) cman(U) radeon md5 ipv6 parport_pc lp parport autofs4
i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd
ehci_hcd e752x_edac edac_mc hw_random shpchp e1000 floppy qla2300 qla2xxx sg
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod lpfc scsi_transport_fc
megaraid_mbox megaraid_mm sd_mod scsi_mod
Pid: 15892, comm: genesis Not tainted 2.6.9-25.ELsmp
RIP: 0010:[<ffffffffa03169e7>] <ffffffffa03169e7>{:lock_dlm:do_dlm_lock+365}
RSP: 0018:00000101fe1edc08  EFLAGS: 00010216
RAX: 0000000000000001 RBX: 00000000ffffffea RCX: ffffffff803da608
RDX: ffffffff803da608 RSI: 0000000000000246 RDI: ffffffff803da600
RBP: 0000010200c99380 R08: ffffffff803da608 R09: 00000000ffffffea
R10: 0000000100000000 R11: 0000000000000000 R12: 00000101ffc06800
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ff




TAFT-01:
dlm: taft5: process_lockqueue_reply id 52940338 state 0
dlm: taft5: process_lockqueue_reply id 5e5c01c3 state 0
dlm: taft8: process_lockqueue_reply id 5a4b01a7 state 0
dlm: taft7: process_lockqueue_reply id 5c880244 state 0
dlm: taft1: process_lockqueue_reply id 5fad01f5 state 0
dlm: taft0: process_lockqueue_reply id 6afe03b8 state 0
dlm: taft1: process_lockqueue_reply id 639901c5 state 0
dlm: taft3: process_lockqueue_reply id 692f033a state 0
dlm: taft6: process_lockqueue_reply id 67870045 state 0
dlm: taft6: process_lockqueue_reply id 69be01e6 state 0
dlm: taft0: process_lockqueue_reply id 76560157 state 0
dlm: taft4: process_lockqueue_reply id 703301e5 state 0
dlm: taft0: process_lockqueue_reply id 7d5a00ff state 0
dlm: taft4: process_lockqueue_reply id 7be70298 state 0
dlm: taft1: process_lockqueue_reply id 797b0277 state 0
dlm: taft3: process_lockqueue_reply id 7c220042 state 0
dlm: taft6: process_lockqueue_reply id 7e6a0252 state 0
dlm: taft1: process_lockqueue_reply id 7f7a0005 state 0
dlm: taft1: process_lockqueue_reply id 81bb0193 state 0
dlm: taft9: process_lockqueue_reply id 7b610155 state 0
dlm: taft5: process_lockqueue_reply id 855f02dc state 0
dlm: taft0: process_lockqueue_reply id 97560163 state 0
dlm: taft4: process_lockqueue_reply id 903e003d state 0
dlm: taft0: process_lockqueue_reply id 994400d0 state 0
dlm: taft6: process_lockqueue_reply id 8d250169 state 0
dlm: taft0: process_lockqueue_reply id a37b02c2 state 0
dlm: taft6: process_lockqueue_reply id 97c9011a state 0
dlm: taft8: process_lockqueue_reply id 98fb038f state 0
dlm: taft8: process_lockqueue_reply id a2a402ca state 0
dlm: taft5: process_lockqueue_reply id a81e00db state 0
dlm: taft5: process_lockqueue_reply id a9d7026e state 0
dlm: taft0: process_lockqueue_reply id b92001d7 state 0
dlm: taft3: process_lockqueue_reply id b0e60153 state 0
dlm: taft4: process_lockqueue_reply id b56e0121 state 0
dlm: taft7: process_lockqueue_reply id aaea0016 state 0
dlm: taft3: process_lockqueue_reply id b71403cd state 0
dlm: taft3: process_lockqueue_reply id b989032b state 0
dlm: taft6: process_lockqueue_reply id b50403b2 state 0
dlm: taft0: process_lockqueue_reply id cb600337 state 0
dlm: taft0: process_lockqueue_reply id cae403c3 state 0
dlm: taft6: process_lockqueue_reply id bd100340 state 0
dlm: taft0: process_lockqueue_reply id d6380074 state 0
dlm: taft9: process_lockqueue_reply id c184016e state 0
dlm: taft0: process_lockqueue_reply id e79b01bc state 0
dlm: taft8: process_lockqueue_reply id d32c02b4 state 0
dlm: taft1: process_lockqueue_reply id d5520007 state 0
dlm: taft5: process_lockqueue_reply id dd9b0196 state 0
dlm: taft8: process_lockqueue_reply id de62034f state 0
CMAN: removing node taft-04 from the cluster : Missed too many heartbeats
CMAN: too many transition restarts - will die
WARNING: dlm_emergency_shutdown
WARNING: dlm_emergency_shutdown
SM: 00000001 sm_stop: SG still joined
SM: 01000002 sm_stop: SG still joined
SM: 02000004 sm_stop: SG still joined
SM: 03000019 sm_stop: SG still joined
dlm: dlm_unlock: lkid dd8a02c9 lockspace not found
fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 send einval to 2
taft6 send einval to 2
taft5 (4394) req reply einval e47802cf fr 4 r 4        2
taft5 (4394) req reply einval e47802cf fr 4 r 4        2
taft5 (4394) req reply einval e47802cf fr 4 r 4        2
taft8 grant lock on lockqueue 2
taft8 process_lockqueue_reply id de62034f state 0
taft3 send einval to 1
taft1 send einval to 1
taft7 send einval to 4
Magma send einval to 4
taft8 (4445) req reply einval deb7034f fr 1 r 1        2
taft7 send einval to 2
taft3 send einval to 4
Magma send einval to 4
Magma send einval to 4
Magma send einval to 4
taft2 (4335) req reply einval e9600338 fr 1 r 1        2
taft5 send einval to 1
en punlock 7,db64e
9159 remove 7,db64e
9159 ex punlock 0
9147 en plock 7,7cd5a
9144 en plock 7,1678aa
9149 en plock 7,eabb2
9149 req 7,eabb2 ex 0-7fffffffffffffff lkf 2000 wait 1
9149 ex plock 0
9149 en punlock 7,eabb2
9149 remove 7,eabb2
9149 ex punlock 0
9147 req 7,7cd5a ex 0-7fffffffffffffff lkf 2000 wait 1
9144 req 7,1678aa ex 0-7fffffffffffffff lkf 2000 wait 1
9147 ex plock 0
9147 en punlock 7,7cd5a
9147 remove 7,7cd5a
9147 ex punlock 0
9144 ex plock 0
9144 en punlock 7,1678aa
9144 remove 7,1678aa
9144 ex punlock 0
9166 en plock 7,fa30
9166 req 7,fa30 ex 0-7fffffffffffffff lkf 2000 wait 1
9166 ex plock 0
9166 en punlock 7,fa30
9166 remove 7,fa30
9166 ex punlock 0
9147 en plock 7,7cd5b
9147 req 7,7cd5b ex 0-7fffffffffffffff lkf 2000 wait 1
9147 ex plock 0
9147 en punlock 7,7cd5b
9147 remove 7,7cd5b
9147 ex punlock 0
9149 en plock 7,7cca5
9149 req 7,7cca5 ex 0-7fffffffffffffff lkf 2000 wait 1
9147 en plock 7dlm: dlm_unlock: lkid d4330302 lockspace not found
fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 send einval to 2
taft6 send einval to 2
taft5 (4394) req reply einval e47802cf fr 4 r 4        2
taft5 (4394) req reply einval e47802cf fr 4 r 4        2
taft5 (4394) req reply einval e47802cf fr 4 r 4        2
taft8 grant lock on lockqueue 2
taft8 process_lockqueue_reply id de62034f state 0
taft3 send einval to 1
taft1 send einval to 1
taft7 send einval to 4
Magma send einval to 4
taft8 (4445) req reply einval deb7034f fr 1 r 1        2
taft7 send einval to 2
taft3 send einval to 4
Magma send einval to 4
Magma send einval to 4
Magma send einval to 4
taft2 (4335) req reply einval e9600338 fr 1 r 1        2
taft5 send einval to 1
,eaba6
9149 ex plock 0
9147 req 7,eaba6 ex 0-7fffffffffffffff lkf 2000 wait 1
9147 ex plock 0
9147 en punlock 7,eaba6
9147 remove 7,eaba6
9147 ex punlock 0
9147 en plock 7,db557
9147 req 7,db557 ex 0-7fffffffffffffff lkf 2000 wait 1
9144 en plock 7,dc25a
9144 req 7,dc25a ex 0-7fffffffffffffff lkf 2000 wait 1
9144 ex plock 0
9144 en punlock 7,dc25a
9144 remove 7,dc25a
9144 ex punlock 0
9147 ex plock 0
9142 en plock 7,7d0a5
9144 en plock 7,1678c0
9144 req 7,1678c0 ex 0-7fffffffffffffff lkf 2000 wait 1
9142 req 7,7d0a5 ex 0-7fffffffffffffff lkf 2000 wait 1
9142 ex plocdlm: dlm_unlock: lkid d5ef03fc lockspace not found
fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 (9152) req reply einval df7600ab fr 2 r 2        2
taft6 send einval to 2
taft6 send einval to 2
taft5 (4394) req reply einval e47802cf fr 4 r 4        2
taft5 (4394) req reply einval e47802cf fr 4 r 4        2
taft5 (4394) req reply einval e47802cf fr 4 r 4        2
taft8 grant lock on lockqueue 2
taft8 process_lockqueue_reply id de62034f state 0
taft3 send einval to 1
taft1 send einval to 1
taft7 send einval to 4
Magma send einval to 4
taft8 (4445) req reply einval deb7034f fr 1 r 1        2
taft7 send einval to 2
taft3 send einval to 4
Magma send einval to 4
Magma send einval to 4
Magma send einval to 4
taft2 (4335) req reply einval e9600338 fr 1 r 1        2
taft5 send einval to 1
k 0
9144 ex plock 0
9144 en punlock 7,1678c0
9144 remove 7,1678c0
9142 en punlock 7,7d0a5
9144 ex punlock 0
9142 remove 7,7d0a5
9142 ex punlock 0
9147 en punlock 7,db557
9147 remove 7,db557
9147 ex punlock 0
9149 en punlock 7,7cca5
9149 remove 7,7cca5
9149 ex punlock 0
9147 en plock 7,7cc7a
9147 req 7,7cc7a ex 0-7fffffffffffffff lkf 2000 wait 1
9147 ex plock 0
9144 en plock 7,dcaeb
9144 req 7,dcaeb ex 0-7fffffffffffffff lkf 2000 wait 1
9144 ex plock 0
9144 en punlock 7,dcaeb
9144 remove 7,dcaeb
9144 ex punlock 0
9144 en plock 7,7d056
9144 req 7,7d056 ex 0-7fffffffffffffff lkf 2000 wait 1
9142 en plock 7,dc375
9142 req 7,dc375 ex 0-7fffffffffffffff lkf 2000 wait 1
9142 ex plock 0
9147 en punlock 7,7cc7a
9147 remove 7,7cc7a
9147 ex punlock 0
9144 ex plock 0
9144 en punlock 7,7d056
9144 remove 7,7d056
9144 ex punlock 0
9144 en plock 7,167d4d
9144 req 7,167d4d ex 0-7fffffffffffffff lkf 2000 wait 1
9144 ex plock 0
9144 en punlock 7,167d4d
9144 remove 7,167d4d
9144 ex punlock 0
9144 en plock 7,db219
9144 req 7,db219 ex 0-7fffffffffffffff lkf 2000 wait 1
9142 en punlock 7,dc375
9147 en plock 7,7cd5c
9142 remove 7,dc375
9147 req 7,7cd5c ex 0-7fffffffffffffff lkf 2000 wait 1
9142 ex punlock 0
9144 ex plock 0
9142 en plock 7,dc39d
9147 ex plock 0
9147 en punlock 7,7cd5c
9147 remove 7,7cd5c
9147 ex punlock 0
9142 req 7,dc39d ex 0-7fffffffffffffff lkf 2000 wait 1
9142 ex plock 0
9142 en punlock 7,dc39d
9142 remove 7,dc39d
9142 ex punlock 0
9149 en plock 7,7cc3e
9142 en plock 7,dc251
9142 req 7,dc251 ex 0-7fffffffffffffff lkf 2000 wait 1
9149 req 7,7cc3e ex 0-7fffffffffffffff lkf 2000 wait 1
9142 ex plock 0
9142 en punlock 7,dc251
9142 remove 7,dc251
9142 ex punlock 0
9149 ex plock 0
9144 en punlock 7,db219
9144 remove 7,db219
9144 ex punlock 0
9144 en plock 7,dc244
9144 req 7,dc244 ex 0-7fffffffffffffff lkf 2000 wait 1
9149 en punlock 7,7cc3e
9149 remove 7,7cc3e
9149 ex punlock 0
9144 ex plock 0
9149 en plock 7,7cd5d
9149 req 7,7cd5d ex 0-7fffffffffffffff lkf 2000 wait 1
9149 ex plock 0
9149 en punlock 7,7cd5d
9149 remove 7,7cd5d
9149 ex punlock 0
9144 en punlock 7,dc244
9144 remove 7,dc244
9144 ex punlock 0
9149 en plock 7,db41a
9149 req 7,db41a ex 0-7fffffffffffffff lkf 2000 wait 1
9149 ex plock 0
9149 en punlock 7,db41a
9149 remove 7,db41a
9149 ex punlock 0
9144 en plock 7,dc39e
9144 req 7,dc39e ex 0-7fffffffffffffff lkf 2000 wait 1
9144 ex plock 0
9144 en punlock 7,dc39e
9144 remove 7,dc39e
9144 ex punlock 0
9144 en plock 7,dc392
9144 req 7,dc392 ex 0-7fffffffffffffff lkf 2000 wait 1
9144 ex plock 0
9144 en punlock 7,dc392
9144 remove 7,dc392
9144 ex punlock 0

lock_dlm:  Assert<io4>n  efani lpuednl oocnk



TAFT-02:
dlm: taft9: process_lockqueue_reply id 4fa60274 state 0
dlm: taft4: process_lockqueue_reply id 5c5803dc state 0
dlm: taft6: process_lockqueue_reply id 59280149 state 0
dlm: taft8: process_lockqueue_reply id 60be00e6 state 0
dlm: taft5: process_lockqueue_reply id 68220151 state 0
dlm: taft1: process_lockqueue_reply id 70360023 state 0
dlm: taft4: process_lockqueue_reply id 87e402ba state 0
dlm: taft4: process_lockqueue_reply id 943d008c state 0
dlm: taft5: process_lockqueue_reply id 980802da state 0
dlm: taft6: process_lockqueue_reply id 9f7c03e7 state 0
dlm: taft4: process_lockqueue_reply id aeba01f3 state 0
dlm: taft4: process_lockqueue_reply id b5be0040 state 0
dlm: taft0: process_lockqueue_reply id c81501ad state 0
dlm: taft5: process_lockqueue_reply id c017004d state 0
dlm: taft1: process_lockqueue_reply id d25a0045 state 0
dlm: taft6: process_lockqueue_reply id dcf0014d state 0
dlm: taft9: process_lockqueue_reply id d986015a state 0
dlm: taft8: process_lockqueue_reply id ecb700c1 state 0
CMAN: removing node taft-04 from the cluster : Missed too many heartbeats
CMAN: too many transition restarts - will die
WARNING: dlm_emergency_shutdown
WARNING: dlm_emergency_shutdown
SM: 00000001 sm_stop: SG still joined
SM: 01000002 sm_stop: SG still joined
SM: 02000004 sm_stop: SG still joined
SM: 03000019 sm_stop: SG still joined
dlm: dlm_unlock: lkid f77803dc lockspace not found
ly einval 1c80214 fr 3 r 3 usrm::vf
Magma (16374) req reply einval 1ba00ef fr 4 r 4 usrm::vf
taft9 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 (4268) req reply einval f7c400f0 fr 3 r 3        2
taft6 (4268) req reply einval f7c400f0 fr 3 r 3        2
taft8 send einval to 1
taft9 (29665) req reply einval ee5e01b1 fr 4 r 4        2
taft7 (4280) req reply einval f47502cc fr 3 r 3        2
taft1 send einval to 1
Magma (16374) req reply einval 1a8005c fr 4 r 4 usrm::rg="logma
taft7 (4279) req reply einval f46e0255 fr 4 r 4        2
taft7 send einval to 4
taft7 send einval to 4
taft7 send einval to 4
taft7 send einval to 4
taft2 (4188) req reply einval 5aa00d2 fr 1 r 1        2
taft5 send einval to 1
taft5 send einval to 1
taft5 send einval to 1
ait 1
29667 ex plock 0
29667 en punlock 7,5d9be
29667 remove 7,5d9be
29667 ex punlock 0
29667 en plock 7,5dab4
29667 req 7,5dab4 ex 0-7fffffffffffffff lkf 2000 wait 1
29667 ex plock 0
29664 en plock 7,ab93b
29664 req 7,ab93b ex 0-7fffffffffffffff lkf 2000 wait 1
29664 ex plock 0
29667 en punlock 7,5dab4
29667 remove 7,5dab4
29667 ex punlock 0
29664 en punlock 7,ab93b
29664 remove 7,ab93b
29664 ex punlock 0
29660 en plock 7,5e098
29660 req 7,5e098 ex 0-7fffffffffffffff lkf 2000 wait 1
29660 ex plock 0
29660 en punlock 7,5e098
29660 remove 7,5e098
29660 ex punlock 0
29660 en plock 7,5e14e
29660 req 7,5e14e ex 0-7fffffffffffffff lkf 2000 wait 1
29667 en plock 7,5da81
29667 req 7,5da81 ex 0-7fffffffffffffff lkf 2000 wait 1
29667 ex plock 0
29667 en punlock 7,5da81
29667 remove 7,5da81
29667 ex punlock 0
29660 ex plock 0
29660 en punlock 7,5e14e
29660 remove 7,5e14e
29660 ex punlock 0
29650 en plock 7,9bf55
29662 en punlock 7,5da9a
29662 remove 7,5da9a
29662 ex punlock 0
29650 req 7,9bf55 ex 0-7fffffffffffffff lkf 2000 wait 1
29655 en plock 7,9c1b0
29655 req 7,9c1b0 ex 0-7fffffffffffffff lkf 2000 wait 1
29655 ex plock 0
29655 en punlock 7,9c1b0
29655 remove 7,9c1b0
29655 ex punlock 0
29650 ex plock 0
29650 en punlock 7,9bf55
29650 remove 7,9bf55
29650 ex punlock 0
29664 en plock 7,ab965
29664 req 7,ab965 ex 0-7fffffffffffffff lkf 2000 wait 1
29664 ex plock 0
29664 en punlock 7,ab965
29664 remove 7,ab965
29664 ex punlock 0
29672 en plock 7,5d9e7
29660 en plock 7,7ce2f
29672 req 7,5d9e7 ex 0-7fffffffffffffff lkf 2000 wait 1
29650 en plock 7,ab9a1
29660 req 7,7ce2f ex 0-7fffffffffffffff lkf 2000 wait 1
29672 ex plock 0
29650 req 7,ab9a1 ex 0-7fffffffffffffffdlm: dlm_unlock: lkid f8f202b2 lockspace
not found
ly einval 1c80214 fr 3 r 3 usrm::vf
Magma (16374) req reply einval 1ba00ef fr 4 r 4 usrm::vf
taft9 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 send einval to 3
taft6 (4268) req reply einval f7c400f0 fr 3 r 3        2
taft6 (4268) req reply einval f7c400f0 fr 3 r 3        2
taft8 send einval to 1
taft9 (29665) req reply einval ee5e01b1 fr 4 r 4        2
taft7 (4280) req reply einval f47502cc fr 3 r 3        2
taft1 send einval to 1
Magma (16374) req reply einval 1a8005c fr 4 r 4 usrm::rg="logma
taft7 (4279) req reply einval f46e0255 fr 4 r 4        2
taft7 send einval to 4
taft7 send einval to 4
taft7 send einval to 4
taft7 send einval to 4
taft2 (4188) req reply einval 5aa00d2 fr 1 r 1        2
taft5 send einval to 1
taft5 send einval to 1
taft5 send einval to 1
 lkf 2000 wait 1
29660 ex plock 0
29660 en punlock 7,7ce2f
29660 remove 7,7ce2f
29660 ex punlock 0
29650 ex plock 0
29672 en punlock 7,5d9e7
29672 remove 7,5d9e7
29672 ex punlock 0
29660 en plock 7,5e0ff
29660 req 7,5e0ff ex 0-7fffffffffffffff lkf 2000 wait 1
29660 ex plock 0
29660 en punlock 7,5e0ff
29660 remove 7,5e0ff
29660 ex punlock 0
29660 en plock 7,5e140
29660 req 7,5e140 ex 0-7fffffffffffffff lkf 2000 wait 1
29650 en punlock 7,ab9a1
29650 remove 7,ab9a1
29660 ex plock 0
29650 ex punlock 0
29660 en punlock 7,5e140
29660 remove 7,5e140
29660 ex punlock 0
29650 en plock 7,ab8af
29650 req 7,ab8af ex 0-7fffffffffffffff lkf 2000 wait 1
29650 ex plock 0
29650 en punlock 7,ab8af
29650 remove 7,ab8af
29650 ex punlock 0
29662 en plock 7,ab966
29662 req 7,ab966 ex 0-7fffffffffffffff lkf 2000 wait 1
29662 ex plock 0
29662 en punlock 7,ab966
29662 remove 7,ab966
29662 ex punlock 0
29664 en plock 7,ab967
29662 en plock 7,ab930
29662 req 7,ab930 ex 0-7fffffffffffffff lkf 2000 wait 1
29662 ex plock 0
29664 req 7,ab967 ex 0-7fffffffffffffff lkf 2000 wait 1
29664 ex plock 0
29664 en punlock 7,ab967
29664 remove 7,ab967
29664 ex punlock 0
29662 en punlock 7,ab930
29662 remove 7,ab930
29662 ex punlock 0
29664 en plock 7,ab904
29664 req 7,ab904 ex 0-7fffffffffffffff lkf 2000 wait 1
29664 ex plock 0
29664 en punlock 7,ab904
29664 remove 7,ab904
29664 ex punlock 0
29662 en plock 7,aba93
29662 req 7,aba93 ex 0-7fffffffffffffff lkf 2000 wait 1
29662 ex plock 0
29664 en plock 7,9c179
29664 req 7,9c179 ex 0-7fffffffffffffff lkf 2000 wait 1
29664 ex plock 0
29664 en punlock 7,9c179
29664 remove 7,9c179
29664 ex punlock 0
29662 en punlock 7,aba93
29662 remove 7,aba93
29662 ex punlock 0
29662 en plock 7,ab968
29662 req 7,ab968 ex 0-7fffffffffffffff lkf 2000 wait 1
29662 ex plock 0
29662 en punlock 7,ab968
29662 remove 7,ab968
29662 ex punlock 0
29664 en plock 7,9bfe4
29664 req 7,9bfe4 ex 0-7fffffffffffffff lkf 2000 wait 1
29664 ex plock 0
29664 en punlock 7,9bfe4
29664 remove 7,9bfe4
29664 ex punlock 0
29662 en plock 7,9bf16
29662 req 7,9bf16 ex 0-7fffffffffffffff lkf 2000 wait 1
29662 ex plock 0
29662 en punlock 7,9bf16
29662 remove 7,9bf16
29662 ex punlock 0
29662 en plock 7,9c0bf
29662 req 7,9c0bf ex 0-7fffffffffffffff lkf 2000 wait 1
29662 ex plock 0
29662 en punlock 7,9c0bf
29662 remove 7,9c0bf
29662 ex punlock 0
29663 en plock 7,ab93a
29663 req 7,ab93a ex 0-7fffffffffffffff lkf 2000 wait 1
a
4>loickt_ dl1m:
s2ser9ti6o6n f7ai leedx on  lpinle 3o57c okf  fi0le
/u2s9r/s6r6c/b7u ilde/n66 44p08u-x8n6_l64o/BcUIkL D/g7fs,-5kerdne9lb-2.e6
9-246/9s6mp6/s7rc /dlrme/lomcok.vc                                       .
l oc7k_,d5lm:d  9absseert         e
ion2:9  "6!e6r7ro r"
xl ockp_dulmn:l  toimcek =  5103
18424975                        5
6ta7ft 1e: nerr opr=-l2o2 ncukm=2 ,a7b,9205 ldkaf=0b 4fla
s2=894                                                   g
      6
7- ---r-e---q-- -7 [,c5ut dhaerb4e  ]e -x- --0----7-f-f f[fpflfefafsef
fbfiftffef  hlerkef  ]20 -0-0- -w-ai-t-- -1


9662796 e6x4  epln opcklo c0
72,9ab669645                k
n p2l9o6c6k4  7re,qa b793,bab
6259 e66x 4 0-r7eqf f7ff,affbf9f3bff fexff 0ff- 7flkfff ff2f00f0f ffwaffitf f1f
lkf2 962064 0e0x w apliot c1k
                              0
9629646 6e4x e npl opcuknl o0c
 279,6ab6976 e5n              k
p2u9nl6o64ck  r7e,mo5vdaeb 47,
ab929665
7 r29em6o6v4 ee x7 ,5pdunabl4oc
k 029
6729 e67x2 p eunnl polcok c0k
7,259d69e674
en2 96pu60n loenck  p7lo,acbk 937b,7
e229f6                              c
64 29r6e7m2o vere q7 ,7ab,953d9be
 e29x6 60-47f fexf ffpfufnlfofcffk f0f
f l2k9f66 20 0e0n0  pwlaiotc k1       f
5e2099685                       7,
 en29 p6l6o0c kre 7q ,7a,b59ae109
 e2x9 660-07 frfefqf 7f,f7fcfeff2ff fefxf f0 -lk7fff 2f0f0f0f ffwfaifftf f1
f l2k9f6 6200 0e0x  wapliotc 1k                                            f
                                0
96297626 0ex e pnl opuckn lo0c
 72,956e500 9r8               k
q 72,96ab690 a1re emxo v0e -77f,f5fe0ff9f8f
f2ff96ff6f0f  elxk pf un2l0o0c0 k w0a      f
 219                                 it
6029 6e6n0 p lexoc kp l7o,c5k e104
                                  e
292669606 0en  rpeuq n7lo,c5ek1 74,e 7ecex2 f0
7f2f9f6f6f0f ffrfemffovfef ff7 ,l7kcfe 2f20   -
0 2w96ai6t0  1ex                           0
 pu29nl66oc7k  en0
l2o9c6k5 70, 5exda p8l1o
ck 209
672 9r6e7q 27 ,e5nd apu81nl oecxk 0 7,-57dff9fe7ff
fff2f9f6f72f ffrfe mlovkfe  270,05d09 we7ai
 219                                       t
67229 6e6x 7 puexn lpoclko ck0
                               0
29626906 67en  epnl pocukn l7o,5cke0 f7f,5
da8291
602 9re66q 77 r,5eme0ofvef  e7,x 5d0-a78f1f
fff2f96ff67ff fefxf fpuf nllokcf k2 0000
 w2a9it6 601
ex2 p96lo60c k ex0
pl2oc96k 600
n2 9pu6n60lo ecnk  7pu,5nle1o4cke
,52e960f60f                       7
e2m9o6v6e0 7 r,5eme1o4vee
,52e096ff60               7
ex2 9p6u6n0 loexck p 0un
lo2c9k6 050
 e2n 96pl60oc ken 7 p,9locbkf 575
e214906                          ,5
2 2en9 6p60un rloeqc 7k, 57e,154da0 9aex
 02-79f66f2ff rffemfofvfeff 7ff,5ffda l9akf
20209066 2wa ietx  1pu
nl2oc9k6 500
en2 p96un50l orcekq  77,,a9b9bfa155
 e2x 906-570f frfefmoffveff 7,ffafb9ffa1f
 l2k9f 662000 0ex  wplaioctk 1 0         f

2296965550 e en x plpuocnlk o7ck, 9c0
02                                   1b
9626906 5e5n  rpueqnl 7oc,k9 c71b,50e 1ex40
-72ff96f6f0f ffrfefmoffveff 7ff, 5lek1f40   0
0209 6wa60it e x1                         20
un2l9o65ck5  0ex  p
p2lo96c5k 00 e
n p29lo65ck5  e7n, apbu8anlf
k2 976,590c1 br0e           oc
 72,9ab658a5 f reex mo0-ve7f 7f,f9ffcf1bff0
f2ff96ff5 5 lkexf  p2u00n0l owcaki 0t      ff
1
29269560 50ex e px loplckoc 0k
                               0
926596050 e en np puunnllocockk  77,,9abfb855af

2926956050  rreemmovoev e7 ,7,9abfb855af

2296965500 e ex x ppununlolocckk  00<
>2dl96m:62  dlenm_ pulnolcokc k7: ,lakbi9d6 6
72090601621  rleqo c7k,sapbac96e 6 noext  0f-ou7nffd
ffflfyf feiffnvffalff 1 lc8k0f2 21400 f0 r wa3 irt  31
s2r9m:66:2v f                                          u
 Mplaogmcka  0(
6327496)6 2r eeq nr pepunllyo ckei n7v,aabl 961b6a
002ef9 6f62r  r4e mro 4ve u s7,rmab::96vf6

t2a9f66t92  sexen dpu enlinocvkal  0t
o 239
6t4a eftn 6 psleoncdk  7e,inabva96l7 t
o 23
662t afent 6p lsoecndk  e7,inabva93l 0
 23                                   to
6t2 arfteq6  7s,enadb9 e3i0n evax l 0-t7of f3
fftfaffftf6ff sfefnffd  lekifn v2al00 t0o  wa3
t t1                                          i
ft269 s66en2 de xei pnvloalck  to0
                                   3
9t66af4 t6re qse 7nd, aeb9i6nv7a lex t 0o -37f
fftfaffftff6 fsfefnfdf fef ilnvkfal 2 0to00 3
itt 1af                                       wa
t6 29se66n4d  eeix nvpalol ctko 0 3

2t96af64t6 e sn enpudn leoicnvk al7, atob9 367

t2af96t66 4 serendmo evien v7a,la bt96o 7
                                         3
2t96a6f4t 6e xs epndun eloickn v0a
 t2o 963                          l
2 tena fptu6 n(l4o2c6k 8)7, raeb9q3 r0e
pl2y 96ei62n varel mof7vec 470,0af0b9 f30r
3 2r 963 6 2   ex   p u n2l o  ck    0

2t9a6f6t46  en(4 p26lo8c) k re7,qa br9e0p4l
 e2i9n6va6l4  fre7cq 407,0afb09 0f4r  e3 x r 0-3 7f f  ff  ff f 2ff  ff  f ff  f
lktf af20t800 s wenadit  ei1
a2l 9t6o64 1                nv
x tpalfoct9k  (0
9626596) 6r4e eq n reppunlylo ecikn 7va,alb 9ee054e
012b91 66fr4  r4 emr ove4  7  ,a  b9  04 2
  2  96  64
 etx afptu7nl (oc42k8 00)
 r2eq9 6r6e2p leny  peiloncvak l 7,f4a7ba59032c
c 2fr96 362  rr 3e q  7  ,a  ba9  3 2 ex    0-  7f f
fftfafftf1ff sfefnfdff e linkfva l2 0t0o0  1w       f
it Ma1                                       a
a2 9(6166237 e4x)  preloqc kr e0p
 2ei96nv6a4l e 1na p80lo05cck  f7r,9 c41 7r 9
 u2sr9m66::4 rgre=q" l7og,m9ca1              4
9 teaxft 07 -7(4ff27ff9)f ffreffq ffrfefpflyf  eliknfv 2al00 f0 46wae0it25 15
fr2 496 6r4  4 ex    p lo  ck 2 0
 2  9 66 4
etna pfutn7l osecnkd 7 e,i9cnv17al9
o2 496                              t
4t rafemto7v se en7,d9 ec1in79va
l 2t9o6 644
x tapfunt7lo scke n0d
ei2nv9a6l6 2 toen 4 p
unltaocftk7 7 ,saenbad 93ei
nva2l96 t62o  4re
ovtea f7t,2a (b4a9138
) 2re96q 62re pelxy  pueinlnvocalk  05a
a0209d626 2f re n1  pr l1o ck   7  , ab 9 628
  2  96  62
 rteqa ft7,5 abse968n de xe i0nv-7alff ftfo ff1
ftffaffft5ff s lenkfd  2e0i0n0v walai tt o 11  ff

2t9a66f2t 5e xs enpld oceik n0va
l t2o96 162
 ean itpu 1nl
oc2k 9676,7a b9ex6 8p
lo2ck96 602
re2m9o6v6e7  7en, apbu968n
oc2k 967,625d e9xbe       l
un2loc96k 670       p
re2m9ov66e 4 7,e5nd p9bloe
 27,969b6f7e 4e           ck
p2un96lo6c4k  r0eq
 72,996b6f7e 4 enex p 0lo-7ckf ff7,ff5dfafbff4
ff2ff96f 67lk rf eq20 700,5 wdaaibt4  1ex     f
 0-279f6f64f ffexf fpflfoffckff f0
l2kf9 66240 0e0n w paiutn l1oc    f
k 72,996b6f7 e4e
pl2o9c6k6 40    x
em2ov96e6 47 ,e9bn fep4lo
k2 79,66a4b 93exb        c
u2n9lo6c64k  0re  p
q2 796,a6b2 9e3nb  epxlo 0ck-7 f7,f9fbfff1ff6
ff29ff66ff2  lrkeqf  27,090b0f 1w6a itex 1 0 ff
-72ff96ff64f ffexff pffloffckf f0
f2 29600670  ewan iptu n1l        lk
c2k 967,652d aebx4 p      o
oc29k 660           l
re29mo6v62e  e7,n5 pdaubnl4o
 27,9696bf71 6e             ck
p2un96lo62ck  re0m
ove29 766,94 bfen16 p
n2lo96ck6 2 7,eaxb p9u3bnl
oc2k 960
 29re66mo2 veen  7p,laboc93k b
92c096bf64                    7,
e2x 96pu62nl roceqk  07
9c209bf66 e0 x e0n- 7pflfofcfkf 7ff,f5fef0f9ff8f
 2lk96f6 20 00re0q w a7i,t5 e01                 f
 2ex96 602-7 effxf fpflfocfkf f0f
f2f9f 6l6k2 f en20 pu0n0l owcaikt 7 1,9
c02bf96
02 9e6x6 2p lreomcokv e0
7,29c906b6f0
en 29pu66nl2o cekx  7pu,5nle0oc98k
                                   0
92669066 r3 emeno vpel o7,ck5 e70,9a8b
932a
66209 6e6x 3 pruenql o7ck, a0b
3a2 9e6x 600- 7efnf pffloffckff 7ff,5ffe1f4ffe
k2f9 266000 0 rewqa i7t ,15e                   l
14
elx o0c-k7_fdlffm:ff f fAfsfsfefrftfifofn  flkaif le20d0 o0 n waliinte  13
572 o96f6 7f ielne  p/ulosrck/s 7rc,5/bdau8il1
62649640678- rxe8q6 _76,4/5BdUaI8L1D /egxf s0--7kfefrfneflff-2ff.6f.ff9-ff46ff/s
mlpk/fsr c2/00dl0 m/waloitck .1c
[...]
Kernel panic - not syncing: Oops
 <1>Kernel BUG at lock:357
invalid operand: 0000 [2] SMP
CPU 0
Modules linked in: nfsd exportfs lockd nfs_acl lock_dlm(U) gfs(U)
lock_harness(U) dlm(U) cman(U) radeon md5 ipv6 parport_pc lp parport autofs4
i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac uhci_hcd
ehci_hcd e752x_edac edac_mc hw_random e1000 floppy qla2300 qla2xxx sg
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod lpfc scsi_transport_fc
megaraid_mbox megaraid_mm sd_mod scsi_mod
Pid: 4251, comm: gfs_glockd Not tainted 2.6.9-25.ELsmp
RIP: 0010:[<ffffffffa02fd819>] <ffffffffa02fd819>{:lock_dlm:do_dlm_unlock+167}
RSP: 0018:000001020be8fdd8  EFLAGS: 00010212
RAX: 0000000000000001 RBX: 00000101feb9aa80 RCX: 0000000000020000
RDX: 000000000791f894 RSI: 0000000000000203 RDI: ffffffff803da620
RBP: 00000000ffffffea R08: 00000000fffffffa R09: 00000101feb9aa80
R10: 0000000000000000 R11: 0000000000000000 R12: 00000101f1a3306c
R13: ffffff00102c2000 R14: ffffffffa02f9a40 R15: 00000101f1a33040
FS:  0000002a95574b00(0000) GS:ffffffff804d8380(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a95579b54 CR3: 0000000000101000 CR4: 00000000000006e0
Process gfs_glockd (pid: 4251, threadinfo 000001020be8e000, task 000001020d0f17f0)
Stack: 0000000000000084 0000000000000000 0000000000000000 ffffff00102c2000
       00000101f1a33040 ffffffffa02fdb96 0000000000000002 ffffffffa02c4970
       0000000000000001 ffffffffa02bb3d1
Call Trace:<ffffffffa02fdb96>{:lock_dlm:lm_dlm_unlock+21}
<ffffffffa02c4970>{:gfs:gfs_lm_unlock+41}
       <ffffffffa02bb3d1>{:gfs:gfs_glock_drop_th+290}
<ffffffffa02b9b74>{:gfs:run_queue+314}
       <ffffffffa02b9dc8>{:gfs:unlock_on_glock+37}
<ffffffffa02b9ebe>{:gfs:gfs_reclaim_glock+234}
       <ffffffffa02ae75a>{:gfs:gfs_glockd+61}
<ffffffff80133401>{default_wake_function+0}
       <ffffffff80133401>{default_wake_function+0} <ffffffff80110e17>{child_rip+8}
       <ffffffffa02ae71d>{:gfs:gfs_glockd+0} <ffffffff80110e0f>{child_rip+0}


Code: 0f 0b 52 21 30 a0 ff ff ff ff 65 01 48 c7 c7 57 21 30 a0 31
RIP <ffffffffa02fd819>{:lock_dlm:do_dlm_unlock+167} RSP <000001020be8fdd8>
Badness in do_unblank_screen at drivers/char/vt.c:2876

Call Trace:<ffffffff80232eae>{do_unblank_screen+61}
<ffffffff801231dc>{bust_spinlocks+28}
       <ffffffff801119a8>{oops_end+18} <ffffffff80111ad5>{die+54}
       <ffffffff80111e98>{do_invalid_op+145}
<ffffffffa02fd819>{:lock_dlm:do_dlm_unlock+167}
       <ffffffff80137b79>{vprintk+515} <ffffffff80137c12>{printk+141}
       <ffffffff80110c61>{error_exit+0}
<ffffffffa02fd819>{:lock_dlm:do_dlm_unlock+167}
       <ffffffffa02fd819>{:lock_dlm:do_dlm_unlock+167}
<ffffffffa02fdb96>{:lock_dlm:lm_dlm_unlock+21}
       <ffffffffa02c4970>{:gfs:gfs_lm_unlock+41}
<ffffffffa02bb3d1>{:gfs:gfs_glock_drop_th+290}
       <ffffffffa02b9b74>{:gfs:run_queue+314}
<ffffffffa02b9dc8>{:gfs:unlock_on_glock+37}
       <ffffffffa02b9ebe>{:gfs:gfs_reclaim_glock+234}
<ffffffffa02ae75a>{:gfs:gfs_glockd+61}
       <ffffffff80133401>{default_wake_function+0}
<ffffffff80133401>{default_wake_function+0}
       <ffffffff80110e17>{child_rip+8} <ffffffffa02ae71d>{:gfs:gfs_glockd+0}
       <ffffffff80110e0f>{child_rip+0}
 ----------- [cut here ] --------- [please bite here ] ---------

Comment 1 Christine Caulfield 2006-01-04 14:29:42 UTC
Some timestamps on those messages would be nice. 
And it's not clear why taft-04 left the cluster as those messages are missing,
which is a shame as it's that node that seems to have kicked of the original
transition!

In summary we have:
taft-01:
CMAN: removing node taft-04 from the cluster : Missed too many heartbeats
CMAN: too many transition restarts - will die

taft-02:
CMAN: removing node taft-04 from the cluster : Missed too many heartbeats
CMAN: too many transition restarts - will die

taft-03:
CMAN: removing node taft-04 from the cluster : Missed too many heartbeats
CMAN: removing node taft-01 from the cluster : Inconsistent cluster view
CMAN: removing node taft-02 from the cluster : Inconsistent cluster view
CMAN: quorum lost, blocking activity

taft-04: not stated. assert in lock_dlm (so left cluster for some reason)

I'd love to know why taft-04 stopped responding. Perhaps it got flooded with
plocks or NFS requests that stopped cman responding. Or it could be the network
channels being jammed or flooded, which might possibly explain the cluster
consistency confusion.

Comment 2 Corey Marthaler 2006-01-11 21:07:07 UTC
I hit this again with the same cluster (taft-01 - 04) but this time I had taken
down taft-04, instead of trying to relocate a service.

Same setup as before, 10 GFS and 5 EXT (both grouped as one NFS service). The
NFS client was joynter and the only I/O that it was doing was a touch of each of
the 14 filesystems every 2 seconds. 

I rebooted taft-04 (which wasn't serving any of the services) and that almost
immediately caused taft-01 to shut itself down and end up asserting and then
panicing. Then a little later taft-03 paniced as well.


taft-01:
Jan 11 12:39:27 taft-01 kernel: CMAN: removing node taft-04 from the cluster :
Missed too many heartbeats
Jan 11 12:42:12 taft-01 kernel: CMAN: too many transition restarts - will die
Jan 11 12:42:12 taft-01 kernel: CMAN: we are leaving the cluster. Inconsistent
cluster view
Jan 11 12:42:12 taft-01 kernel: WARNING: dlm_emergency_shutdown
Jan 11 12:42:12 taft-01 clurgmgrd[14100]: <warning> #67: Shutting down uncleanly
Jan 11 12:42:12 taft-01 kernel: WARNING: dlm_emergency_shutdown
Jan 11 12:42:12 taft-01 kernel: SM: 00000001 sm_stop: SG still joined
Jan 11 12:42:12 taft-01 ccsd[2670]: Cluster manager shutdown.  Attemping to
reconnect...
Jan 11 12:42:12 taft-01 kernel: SM: 01000002 sm_stop: SG still joined
Jan 11 12:42:12 taft-01 kernel: SM: 02000004 sm_stop: SG still joined
Jan 11 12:42:12 taft-01 kernel: SM: 03000019 sm_stop: SG still joined
Jan 11 12:42:12 taft-01 ccsd[2670]: Cluster is not quorate.  Refusing connection.
Jan 11 12:42:12 taft-01 ccsd[2670]: Error while processing connect: Connection
refused
Jan 11 12:42:12 taft-01 ccsd[2670]: Invalid descriptor specified (-111).
Jan 11 12:42:12 taft-01 ccsd[2670]: Someone may be attempting something evil.
Jan 11 12:42:12 taft-01 ccsd[2670]: Error while processing get: Invalid request
descriptor
Jan 11 12:42:12 taft-01 ccsd[2670]: Invalid descriptor specified (-111).
Jan 11 12:42:12 taft-01 ccsd[2670]: Someone may be attempting something evil.
Jan 11 12:42:12 taft-01 ccsd[2670]: Error while processing get: Invalid request
descriptor
Jan 11 12:42:12 taft-01 ccsd[2670]: Invalid descriptor specified (-111).
Jan 11 12:42:12 taft-01 ccsd[2670]: Someone may be attempting something evil.
Jan 11 12:42:12 taft-01 ccsd[2670]: Error while processing disconnect: Invalid
request descriptor
Jan 11 12:42:12 taft-01 kernel: rm::rg="logma
Jan 11 12:42:12 taft-01 kernel: Magma (14100) req reply einval 17601e2 fr 2 r 2
usrm::rg="logma
Jan 11 12:42:12 taft-01 kernel: Magma (14100) req reply einval 17b01ef fr 2 r 2
usrm::rg="GFS"
Jan 11 12:42:12 taft-01 kernel: Magma (14100) req reply einval 155035e fr 4 r 4
usrm::vf
Jan 11 12:42:12 taft-01 kernel: Magma (14100) req reply einval 1860100 fr 3 r 3
usrm::vf
Jan 11 12:42:12 taft-01 kernel: Magma send einval to 3
Jan 11 12:42:12 taft-01 kernel: Magma send einval to 3
Jan 11 12:42:12 taft-01 ccsd[2670]: Cluster is not quorate.  Refusing connection.
Jan 11 12:42:12 taft-01 ccsd[2670]: Error while processing connect: Connection
refused
Jan 11 12:42:12 taft-01 ccsd[2670]: Invalid descriptor specified (-111).
Jan 11 12:42:12 taft-01 ccsd[2670]: Someone may be attempting something evil.
Jan 11 12:42:12 taft-01 ccsd[2670]: Error while processing get: Invalid request
descriptor
Jan 11 12:42:12 taft-01 ccsd[2670]: Invalid descriptor specified (-111).
Jan 11 12:42:12 taft-01 ccsd[2670]: Someone may be attempting something evil.
[eventually asserts/panics]


taft-03:
Jan 11 12:40:17 taft-03 kernel: CMAN: removing node taft-04 from the cluster :
Missed too many heartbeats
Jan 11 12:43:02 taft-03 kernel: CMAN: too many transition restarts - will die
Jan 11 12:43:02 taft-03 kernel: CMAN: we are leaving the cluster. Inconsistent
cluster view
Jan 11 12:43:02 taft-03 kernel: WARNING: dlm_emergency_shutdown
Jan 11 12:43:02 taft-03 clurgmgrd[12801]: <warning> #67: Shutting down uncleanly
Jan 11 12:43:02 taft-03 ccsd[2722]: Cluster manager shutdown.  Attemping to
reconnect...
Jan 11 12:43:02 taft-03 kernel: WARNING: dlm_emergency_shutdown
Jan 11 12:43:02 taft-03 kernel: SM: 00000001 sm_stop: SG still joined
Jan 11 12:43:02 taft-03 kernel: SM: 01000002 sm_stop: SG still joined
Jan 11 12:43:02 taft-03 kernel: SM: 02000004 sm_stop: SG still joined
Jan 11 12:43:02 taft-03 kernel: SM: 03000019 sm_stop: SG still joined
Jan 11 12:43:02 taft-03 ccsd[2722]: Cluster is not quorate.  Refusing connection.
Jan 11 12:43:02 taft-03 ccsd[2722]: Error while processing connect: Connection
refused
Jan 11 12:43:02 taft-03 ccsd[2722]: Invalid descriptor specified (-111).
Jan 11 12:43:02 taft-03 ccsd[2722]: Someone may be attempting something evil.
Jan 11 12:43:02 taft-03 ccsd[2722]: Error while processing get: Invalid request
descriptor
Jan 11 12:43:02 taft-03 ccsd[2722]: Invalid descriptor specified (-111).
Jan 11 12:43:02 taft-03 ccsd[2722]: Someone may be attempting something evil.
Jan 11 12:43:02 taft-03 ccsd[2722]: Error while processing get: Invalid request
descriptor
Jan 11 12:43:02 taft-03 ccsd[2722]: Invalid descriptor specified (-111).
Jan 11 12:43:02 taft-03 ccsd[2722]: Someone may be attempting something evil.
Jan 11 12:43:02 taft-03 ccsd[2722]: Error while processing disconnect: Invalid
request descriptor
Jan 11 12:43:02 taft-03 ccsd[2722]: Cluster is not quorate.  Refusing connection.
Jan 11 12:43:02 taft-03 ccsd[2722]: Error while processing connect: Connection
refused
Jan 11 12:43:02 taft-03 ccsd[2722]: Invalid descriptor specified (-111).
Jan 11 12:43:02 taft-03 ccsd[2722]: Someone may be attempting something evil.
Jan 11 12:43:02 taft-03 ccsd[2722]: Error while processing get: Invalid request
descriptor
Jan 11 12:43:02 taft-03 ccsd[2722]: Invalid descriptor specified (-111).
Jan 11 12:43:02 taft-03 ccsd[2722]: Someone may be attempting something evil.
Jan 11 12:43:02 taft-03 ccsd[2722]: Error while processing get: Invalid request
descriptor
Jan 11 12:43:02 taft-03 ccsd[2722]: Invalid descriptor specified (-111).
Jan 11 12:43:02 taft-03 ccsd[2722]: Someone may be attempting something evil.
Jan 11 12:43:02 taft-03 ccsd[2722]: Error while processing disconnect: Invalid
request descriptor
Jan 11 12:43:02 taft-03 clurgmgrd: [12801]: <info> Removing export:
joynter.lab.msp.redhat.com:/mnt/taft0
[eventually asserts/panics]


taft-02:
Jan 11 12:40:32 taft-02 kernel: CMAN: removing node taft-04 from the cluster :
Missed too many heartbeats
Jan 11 12:43:17 taft-02 kernel: CMAN: removing node taft-03 from the cluster :
Inconsistent cluster view
Jan 11 12:43:38 taft-02 kernel: CMAN: removing node taft-01 from the cluster :
No response to messages
Jan 11 12:43:38 taft-02 kernel: CMAN: quorum lost, blocking activity
Jan 11 12:43:42 taft-02 kernel: CMAN: node taft-04 rejoining
Jan 11 12:46:27 taft-02 kernel: CMAN: too many transition restarts - will die
Jan 11 12:46:27 taft-02 kernel: CMAN: we are leaving the cluster. Inconsistent
cluster view
Jan 11 12:46:27 taft-02 kernel: WARNING: dlm_emergency_shutdown





Comment 3 Christine Caulfield 2006-01-13 14:08:32 UTC
This is a cman problem - very probably the same as 177613 (see comments 6 & 7)

Comment 4 Christine Caulfield 2006-01-13 14:54:54 UTC
that's bz#177163 sorry,

Comment 5 Christine Caulfield 2006-01-23 17:01:03 UTC
Marking this as MODIFIED because I am of th eopinion that its the same bug as
#177163 which is now also MODIFIED

Comment 6 Henry Harris 2006-04-27 16:44:59 UTC
We are running RHEL4U3 with cman-kernel-2.6.9.43.8 on an x86_64 which is the 
version in the errata mentioned in bug #177163.  I have hit the same assert in 
lock_dlm:428.  Don't have a lot of info on this but it happened after one of 
our apps segfaulted and prevented the cluster daemons from shutting down 
cleanly. Here is some KDB output:

Stack traceback for pid 15463
0x00000102fec6a030    15463    15453  1    1   R  0x00000102fec6a430 *script.sh
RSP           RIP                Function (args)
0x102f1be3a90 0xffffffffa0267e63 [lock_dlm]do_dlm_lock+0x16d 
(0xdead4ead00000001, 0xffffff0000d32000, 0x102fb6f3658, 0x102fb6f3684, 
0xffffff0000d32000)
0x102f1be3b08 0xffffffffa0267fa6 [lock_dlm]lm_dlm_lock+0xd6 (0x100e3d91cc0, 
0x103f4b90b50, 0x1, 0x102f1be3e58, 0x102f1e1c01e)
0x102f1be3c98 0xffffffff80180d82 do_lookup+0x184 (0xfffffff5, 0x102f1be3e58, 
0x103f4b90b50)
0x102f1be3cc8 0xffffffff8018092a permission+0x33 (0x12, 0x0, 0xe362800e1, 
0x102f1e1c00f, 0x100e3d91cc0)
0x102f1be3ce8 0xffffffff801810a2 __link_path_walk+0x174 (0x103ffd4f6d8, 
0x1010001de00, 0x206, 0xffffffff80120dd7, 0x100000001)
0x102f1be3d58 0xffffffff80181dba link_path_walk+0x52 (0xfd0, 0x102f1e1c000, 
0x102f1be3e58, 0x1, 0xf1e1c000)
0x102f1be3df8 0xffffffff80182007 path_lookup+0x1c3 (0x6d1460, 0x7fbffff520, 
0x102f1be3ef8, 0x6c95d0, 0x0)
0x102f1be3e28 0xffffffff801822b3 __user_walk+0x2f (0x102faafc318, 
0x100e3d91cc0, 0x206, 0xffffffff80120dd7, 0x100000005)
0x102f1be3e58 0xffffffff8017cb22 vfs_stat+0x18
0x102f1be3ef8 0xffffffff8017ce6c sys_newstat+0x11



lock_dlm:  Assertion failed on line 428 of file fs/gfs_locking/lock_dlm/lock.c
<4>lock_dlm:  assertion:  "!error"
<4>lock_dlm:  time = 4476495870
<4>crosswalk: num=2,1f err=-22 cur=-1 req=3 lkf=10000
<4>
<4>crosswalk: num=2,1b err=-22 cur=-1 req=3 lkf=10000
<4>
[1]more>
Only 'q' or 'Q' are processed at more prompt, input ignored
<6>tg3: eth8: Link is down.
<6>tg3: eth9: Link is down.
<4>----------- [cut here ] --------- [please bite here ] ---------
<1>Kernel BUG at lock:428
<0>invalid operand: 0000 [1] SMP
[1]kdb>  [1]kdb>  ^H [1]kdb> p[1]kdb> ps[1]kdb> ps
2 idle processes (state I) and 119 sleeping system daemon (state M) processes
suppressed
Task Addr               Pid   Parent [*] cpu State Thread             Command
0x00000100dd3bb7f0    15454    15429  1    0   R  0x00000100dd3bbbf0
script.sh
0x00000102fec6a030    15463    15453  1    1   R  0x00000102fec6a430
*script.sh

Comment 7 Corey Marthaler 2006-11-06 19:50:41 UTC
This has not been seen in almost 7 months, I'll reopen if reproduced again.