Description of problem: This may be related to bz127008. Similar senario, healthy cluster running I/O. Two nodes are shot (morph-01 and morph-03) and that causes morph-04 to assert and then panic: un 2,29a4bdf id 20036 cur 5 0 un 2,29a4be0 id 10045 cur 5 0 qc 2,1a 3,3 id 10224 sts -65538 un 2,29cff84 id 40053 cur 5 0 un 2,29a52d4 id 400ea cur 5 0 un 2,29b40ef id 302ad cur 5 0 lock_dlm: Assertion failed on line 333 of file /usr/src/cluster/gfs-kernel/src/dlm/lock.c lock_dlm: assertion: "!error" lock_dlm: time = 698515 foobar5: error=-22 num=2,29b40ef Kernel panic: lock_dlm: Record message above and reboot. Jul 21 13:07:20 morph-04 kernel: dlm: foobar2: total nodes 3 Jul 21 13:07:20 morph-04 kernel: dlm: foobar2: rebuild resource directory Jul 21 13:07:20 morph-04 kernel: dlm: foobar2: rebuilt 2080 resources Jul 21 13:07:20 morph-04 kernel: dlm: foobar2: purge requests Jul 21 13:07:20 morph-04 kernel: dlm: foobar2: purged 0 requests Jul 21 13:07:20 morph-04 ccsd[3769]: Error while processing get: No data available Jul 21 13:07:20 morph-04 ccsd[3769]: Error while processing get: No data available Jul 21 13:07:21 morph-04 kernel: dlm: foobar2: mark waiting requests Jul 21 13:07:21 morph-04 kernel: dlm: foobar2: marked 0 requests <1>Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: 00000000 *pde = 00000000 Oops: 0000 [#2] Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<00000000>] Not tainted EFLAGS: 00010017 (2.6.7) EIP is at 0x0 eax: 00beacd4 ebx: 00beacd4 ecx: 00000000 edx: 00000003 esi: 00000000 edi: f7fc36e4 ebp: f479fe54 esp: f479fe34 ds: 007b es: 007b ss: 0068 Process gfs_glockd (pid: 4041, threadinfo=f479e000 task=f49ae1b0) Stack: c0118897 00000000 bffffe24 00000001 00000003 00000000 00000286 f479fe7c f479fe6c c01188f2 00000000 00000000 f8a7c3a0 c23b1e48 f479fe7c f8a7c3be 00000000 c012209d f479fe7c f479fe7c c0122217 00000001 c03b4ea8 0000000a Call Trace: [<c0118897>] __wake_up_common+0x37/0x70 [<c01188f2>] __wake_up+0x22/0x30 [<f8a7c3a0>] dlm_wait_timer_fn+0x0/0x20 [dlm] [<f8a7c3be>] dlm_wait_timer_fn+0x1e/0x20 [dlm] [<c012209d>] run_timer_softirq+0xad/0x150 [<c0122217>] do_timer+0xc7/0xd0 [<c011e809>] __do_softirq+0x79/0x80 [<c011e837>] do_softirq+0x27/0x30 [<c01077c5>] do_IRQ+0xd5/0x110 [<c0105e6c>] common_interrupt+0x18/0x20 [<c011b1d0>] panic+0xe0/0x100 [<f8b9f624>] do_dlm_unlock+0xf4/0x100 [lock_dlm] [<f8b9f94c>] lm_dlm_unlock+0x1c/0x70 [lock_dlm] [<f8a22fed>] gfs_glock_drop_th+0x5d/0x120 [gfs] [<f8a22697>] rq_demote+0x87/0xa0 [gfs] [<f8a2272f>] run_queue+0x7f/0xa0 [gfs] [<f8a2458b>] gfs_reclaim_glock+0x7b/0x110 [gfs] [<f8a16dd7>] gfs_glockd+0x107/0x120 [gfs] [<c0118850>] default_wake_function+0x0/0x10 [<c0105c12>] ret_from_fork+0x6/0x14 [<c0118850>] default_wake_function+0x0/0x10 [<f8a16cd0>] gfs_glockd+0x0/0x120 [gfs] [<c010429d>] kernel_thread_helper+0x5/0x18 Code: Bad EIP value. <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing
to reproduce this I had 4 nodes running make_panic and another four nodes running mount/umount loop for several hours.
if we ever hit this again, there will be a line in the dumped dlm debug log specifying the exact EINVAL condition
Updates with the proper version and component name.
hasn't been seen in over 5 months of recovery testing.
I think this may be the same bug. Was running a four node cluster with link-08 running 'while :; do placemaker -d 7 -w 3; find; rm -rf place_root; done'. link-10 was running a mount/umount loop, and link-12 was looping on bonnie++. After a couple of hours link-08 hit the assertion. I will put full logfiles in ~danderso/bugs/128318. Note: This is a mixed-arch cluster. link-10,link-11,link12 are i686 and link-08 is an x86_64 opteron. Note: The placemaker tool is in the sistina-test tree if wanted/needed. lock_dlm: Assertion failed on line 352 of file /usr/src/build/522379-x86_64/BUILD/smp/src/dlm/lock.c lock_dlm: assertion: "!error" lock_dlm: time = 4305526027 data1: error=-22 num=2,825c37 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at lock:352 invalid operand: 0000 [1] SMP CPU 1 Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) parport_pc lp parport autofs4 dlm(U) cman(U) md5 ipv6 sunrpc ds yenta_socket pcmcia_core button battery ac ohci_hcd hw_random tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc mptscsih mptbase sd_mod scsi_mod Pid: 3380, comm: gfs_glockd Tainted: G M 2.6.9-5.ELsmp RIP: 0010:[<ffffffffa0268804>] <ffffffffa0268804>{:lock_dlm:do_dlm_unlock+189} RSP: 0018:000001001e9f5de8 EFLAGS: 00010212 RAX: 0000000000000001 RBX: 00000000ffffffea RCX: 0000000100000000 RDX: ffffffff803c7508 RSI: 0000000000000246 RDI: ffffffff803c7500 RBP: 000001001cbb6dc0 R08: ffffffff803c7508 R09: 00000000ffffffea R10: 0000000000000097 R11: 0000000000000097 R12: 000001001c637c9c R13: ffffff000016a000 R14: ffffffffa0264d20 R15: 000001001c637c70 FS: 0000002a95563b00(0000) GS:ffffffff804bf380(0000) knlGS:00000000f7ff06c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a95557000 CR3: 000000001ffb2000 CR4: 00000000000006e0 Process gfs_glockd (pid: 3380, threadinfo 000001001e9f4000, task 000001001e507030) Stack: 0000000000000000 ffffff000016a000 000001001c637c70 ffffffffa0268b7e 0000000000000001 ffffffffa023051f 0000000000000001 ffffffffa022709c 000001001fdd4500 000001001c637c70 Call Trace:<ffffffffa0268b7e>{:lock_dlm:lm_dlm_unlock+15} <ffffffffa023051f>{:gfs:gfs_lm_unlock+41} <ffffffffa022709c>{:gfs:gfs_glock_drop_th+290} <ffffffffa0225845>{:gfs:run_queue+314} <ffffffffa0225a9a>{:gfs:unlock_on_glock+37} <ffffffffa0225b90>{:gfs:gfs_reclaim_glock+234} <ffffffffa021a61a>{:gfs:gfs_glockd+61} <ffffffff8013176a>{default_wake_function+0} <ffffffff8013176a>{default_wake_function+0} <ffffffff80110c23>{child_rip+8} <ffffffffa021a5dd>{:gfs:gfs_glockd+0} <ffffffff80110c1b>{child_rip+0} Code: 0f 0b 13 cf 26 a0 ff ff ff ff 60 01 48 c7 c7 18 cf 26 a0 31 RIP <ffffffffa0268804>{:lock_dlm:do_dlm_unlock+189} RSP <000001001e9f5de8> <0>Kernel panic - not syncing: Oops
When node link-08 was power cycled and was rejoining the cluster this happened: Starting cups: [ OK ] Starting sshd:[ OK ] Starting xinetd: [ OK ] Starting sendmail: clvmd move flags 0,1,0 ids 0,2,0 clvmd move use event 2 clvmd recover event 2 (first) clvmd add nodes clvmd total nodes 2 clvmd rebuild resource directory clvmd rebuilt 0 resources clvmd recover event 2 done clvmd move flags 0,0,1 ids 0,2,2 clvmd process held requests clvmd processed 0 requests clvmd recover event 2 finished clvmd move flags 1,0,0 ids 2,2,2 clvmd move flags 0,1,0 ids 2,3,2 clvmd move use event 3 clvmd recover event 3 clvmd add node 11 clvmd add_to_requestq cmd 3 fr 11 clvmd total nodes 3 clvmd rebuild resource directory clvmd rebuilt 0 resources clvmd purge requests clvmd purged 0 requests clvmd mark waiting requests clvmd marked 0 requests clvmd recover event 3 done clvmd move flags 0,0,1 ids 2,3,3 clvmd process held requests clvmd process_requestq cmd 3 fr 11 DLM: Assertion failed on line 1129 of file /usr/src/build/522362-x86_64/BUILD/smp/src/lockqueue.c DLM: assertion: "lkb" DLM: time = 4295181729 dlm: request rh_cmd 3 rh_lkid 103d1 remlkid 103b4 flags 0 status 0 rqmode 255 nodeid 11 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at lockqueue:1129 invalid operand: 0000 [1] SMP CPU 1 Modules linked in: parport_pc lp parport autofs4 dlm(U) cman(U) md5 ipv6 sunrpc ds yenta_socket pcmcia_core button battery ac ohci_hcd hw_random tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc mptscsih mptbase sd_mod scsi_mod Pid: 2282, comm: dlm_recoverd Tainted: G M 2.6.9-5.ELsmp RIP: 0010:[<ffffffffa01d7e70>] <ffffffffa01d7e70>{:dlm:process_cluster_request+4355} RSP: 0018:000001001ef53dd8 EFLAGS: 00010212 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000246 RDX: 0000000000004b32 RSI: 0000000000000246 RDI: ffffffff803c7520 RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 000001003ffd1400 R13: 000001001f3d8cd4 R14: 0000000000000000 R15: 000001003ffd1400 FS: 0000002a95563b00(0000) GS:ffffffff804bf380(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000552ad55cd8 CR3: 000000001ffb2000 CR4: 00000000000006e0 Process dlm_recoverd (pid: 2282, threadinfo 000001001ef52000, task 000001001fac9030) Stack: 0000000000000000 0000000000000000 0000000b00000000 0000000000000002 000001001f1fa030 0000000000000069 0000010020712bc0 0000000180130205 000001001fac9030 0000000000003da6 Call Trace:<ffffffffa01d813c>{:dlm:process_requestqueue+189} <ffffffffa01e1b42>{:dlm:dlm_recoverd+3086} <ffffffffa01e0f34>{:dlm:dlm_recoverd+0} <ffffffff80148300>{keventd_create_kthread+0} <ffffffff801482d7>{kthread+200} <ffffffff80110c23>{child_rip+8} <ffffffff80148300>{keventd_create_kthread+0} <ffffffff8014820f>{kthread+0} <ffffffff80110c1b>{child_rip+0} Code: 0f 0b 85 3c 1e a0 ff ff ff ff 69 04 e9 e9 00 00 00 8b 00 a9 RIP <ffffffffa01d7e70>{:dlm:process_cluster_request+4355} RSP <000001001ef53dd8> <0>Kernel panic - not syncing: Oops
I've seen this now too, here's everything that I could grab above the assert: dlm: dlm_unlock: lkid 50264 lockspace not found ror -105 1a0019 gfs0 remote_stage error -105 1e0298 gfs1 remote_stage error -105 160189 gfs0 remote_stage error -105 25004d gfs1 remote_stage error -105 1801e2 gfs5 remote_stage error -105 1900db gfs3 remote_stage error -105 170275 gfs4 remote_stage error -105 1801ac gfs4 remote_stage error -105 160310 gfs9 remote_stage error -105 1d01dd gfs7 remote_stage error -105 1f00cb gfs9 remote_stage error -105 140104 gfs0 remote_stage error -105 1f018f gfs5 remote_stage error -105 1202a1 gfs3 remote_stage error -105 1703d5 gfs3 remote_stage error -105 1800c6 gfs5 remote_stage error -105 180295 gfs2 remote_stage error -105 170117 gfs6 remote_stage error -105 130274 gfs4 remote_stage error -105 1603cf gfs7 remote_stage error -105 1503a0 gfs9 remote_stage error -105 160071 gfs1 remote_stage error -105 1c0216 gfs2 remote_stage error -105 1b00f7 gfs7 remote_stage error -105 140136 gfs2 remote_stage error -105 2403d5 gfs5 remote_stage error -105 1902e1 gfs3 remote_stage error -105 1401ce gfs4 remote_stage error -105 13025d 3be sts 0 0 9293 ex punlock 0 9293 en plock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 req 7,37 ex 2ec187-2ed3d9 lkf 2000 wait 1 9293 lk 7,37 id 0 -1,5 2000 9293 lk 11,37 id 603be 5,0 4 7131 qc 7,37 -1,5 id 1f0358 sts 0 0 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex plock 0 9293 en punlock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 remove 7,37 9293 un 7,37 1f0358 5 0 7131 qc 7,37 5,5 id 1f0358 sts -65538 0 9293 lk 11,37 id 603be 5,0 4 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex punlock 0 9293 en plock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 req 7,37 ex 2ed3d9-2ed7dd lkf 2000 wait 1 9293 lk 7,37 id 0 -1,5 2000 9293 lk 11,37 id 603be 5,0 4 7131 qc 7,37 -1,5 id 2403a4 sts 0 0 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex plock 0 9293 en punlock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 remove 7,37 9293 un 7,37 2403a4 5 0 7131 qc 7,37 5,5 id 2403a4 sts -65538 0 9293 lk 11,37 id 603be 5,0 4 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex punlock 0 9293 en plock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 req 7,37 ex 2ed7de-2eddd4 lkf 2000 wait 1 9293 lk 7,37 id 0 -1,5 2000 9293 lk 11,37 id 603be 5,0 4 7131 qc 7,37 -1,5 id 1c018a sts 0 0 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex plock 0 9293 en punlock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 remove 7,37 9293 un 7,37 1c018a 5 0 7131 qc 7,37 5,5 id 1c018a sts -65538 0 9293 lk 11,37 id 603be 5,0 4 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex punlock 0 9293 en plock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 req 7,37 ex 2eddd5-2edffc lkf 2000 wait 1 9293 lk 7,37 id 0 -1,5 2000 9293 lk 11,37 id 603be 5,0 4 7131 qc 7,37 -1,5 id 250113 sts 0 0 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex plock 0 9293 en punlock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 remove 7,37 9293 un 7,37 250113 5 0 7131 qc 7,37 5,5 id 250113 sts -65538 0 9293 lk 11,37 id 603be 5,0 4 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex punlock 0 9293 en plock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 req 7,37 ex e12a9-26b17f lkf 2000 wait 1 9293 lk 7,37 id 0 -1,5 2000 9293 lk 11,37 id 603be 5,0 4 7131 qc 7,37 -1,5 id 200044 sts 0 0 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex plock 0 9334 en punlock 7,2d 9334 lk 11,2d id 201d9 0,5 4 7131 qc 11,2d 0,5 id 201d9 sts 0 0 9334 remove 7,2d 9334 un 7,2d 160147 5 0 7131 qc 7,2d 5,5 id 160147 sts -65538 0 9334 lk 11,2d id 201d9 5,0 4 7131 qc 11,2d 5,0 id 201d9 sts 0 0 9334 ex punlock 0 9334 en plock 7,2d 9334 lk 11,2d id 201d9 0,5 4 7131 qc 11,2d 0,5 id 201d9 sts 0 0 9334 req 7,2d ex 0-64da lkf 2000 wait 1 9334 lk 7,2d id 0 -1,5 2000 9334 lk 11,2d id 201d9 5,0 4 7131 qc 7,2d -1,5 id 14038e sts 0 0 7131 qc 11,2d 5,0 id 201d9 sts 0 0 9334 ex plock 0 9293 en punlock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 remove 7,37 9293 un 7,37 200044 5 0 7131 qc 7,37 5,5 id 200044 sts -65538 0 9293 lk 11,37 id 603be 5,0 4 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex punlock 0 9293 en plock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 req 7,37 ex 2ecc8b-2ed161 lkf 2000 wait 1 9293 lk 7,37 id 0 -1,5 2000 9293 lk 11,37 id 603be 5,0 4 7131 qc 7,37 -1,5 id 1e0271 sts 0 0 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex plock 0 9293 en punlock 7,37 9293 lk 11,37 id 603be 0,5 4 7131 qc 11,37 0,5 id 603be sts 0 0 9293 remove 7,37 9293 un 7,37 1e0271 5 0 7131 qc 7,37 5,5 id 1e0271 sts -65538 0 9293 lk 11,37 id 603be 5,0 4 7131 qc 11,37 5,0 id 603be sts 0 0 9293 ex punlock 0 9334 en punlock 7,2d 9334 lk 11,2d id 201d9 0,5 4 9336 en punlock 7,1ffe9 9336 lk 11,1ffe9 id 2006e 0,5 4 8494 un 2,20242 e012f 5 0 8506 un 8,0 100263 5 8 9339 lk 2,20282 id 0 -1,5 0 8429 lk 8,0 id 0 -1,5 8 9315 en punlock 7,10081 9315 lk 11,10081 id 10045 0,5 4 8186 un 2,4ff55 c03b5 5 0 8109 un 2,204a7 12028a 5 0 7964 un 2,400aa a03e0 5 0 8263 un 2,20144 c03c7 5 0 7887 un 2,100a9 200e0 5 0 8340 un 2,20281 50264 5 0 lock_dlm: Assertion failed on line 352 of file /usr/src/build/522381-i686/BUILD/gfs-kernel-2.6.9-23/src/dlm/lock.c lock_dlm: assertion: "!error" lock_dlm: time = 1744300 gfs6: error=-22 num=2,20281 ------------[ cut here ]------------ kernel BUG at /usr/src/build/522381-i686/BUILD/gfs-kernel-2.6.9-23/src/dlm/lock.c:352! invalid operand: 0000 [#13] Modules linked in: gnbd(U) lock_nolock(U) gfs(U) lock_dlm(U) dlm(U) cman(U) lock_harness(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc button battery ac uhci_hcd hw_random e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<f8962ce9>] Not tainted VLI EFLAGS: 00010246 (2.6.9-5.EL) EIP is at do_dlm_unlock+0xa2/0xb7 [lock_dlm] eax: 00000001 ebx: ffffffea ecx: f896854a edx: c4599f44 esi: f5e4b680 edi: f5e4b680 ebp: f8bc9000 esp: c4599f40 ds: 007b es: 007b ss: 0068 Process gfs_glockd (pid: 8340, threadinfo=c4599000 task=c3ffd320) Stack: f896854a f8bc9000 00000001 f8962ff4 f8b964c0 c67cbd7c f8bc9000 f8bc6640 f8b89447 c67cbd7c f8bc6640 c4599fb4 f8b87ef9 c67cbd7c 00000001 f8b880b6 c67cbd7c c67cbd7c f8b8836f c67cbe20 f8b8b9b4 c4599000 c4599fc0 f8b7bbf2 Call Trace: [<f8962ff4>] lm_dlm_unlock+0xe/0x16 [lock_dlm] [<f8b964c0>] gfs_lm_unlock+0x2b/0x40 [gfs] [<f8b89447>] gfs_glock_drop_th+0x17a/0x1b0 [gfs] [<f8b87ef9>] rq_demote+0x15c/0x1da [gfs] [<f8b880b6>] run_queue+0x5a/0xc1 [gfs] [<f8b8836f>] unlock_on_glock+0x6e/0xc8 [gfs] [<f8b8b9b4>] gfs_reclaim_glock+0x257/0x2ae [gfs] [<f8b7bbf2>] gfs_glockd+0x38/0xde [gfs] [<c011b9ea>] default_wake_function+0x0/0xc [<c0301b1a>] ret_from_fork+0x6/0x14 [<c011b9ea>] default_wake_function+0x0/0xc [<f8b7bbba>] gfs_glockd+0x0/0xde [gfs] [<c01041d9>] kernel_thread_helper+0x5/0xb Code: e8 72 d3 7b c7 ff 76 08 8b 06 ff 76 04 ff 76 0c 53 ff 70 18 68 6a 86 96 f8 e8 59 d3 7b c7 83 c4 2c 68 4a 85 96 f8 e8 4c d3 7b c7 <0f> 0b 60 01 dc 83 96 f8 68 4c 85 96 f8 e8 98 c7 7b c7 5b 5e c3
comments 5 and 7 pertain to a cman bug where cman shuts down while gfs/dlm are running. There have been various bugs dealing with this. comment 6 looks new and interesting, but has nothing to do with the other information here.
I'm adding Dave's email to this bug for future reference. Should this get closed then or what state should this bug have since the original issue is fixed but people will continue to see the assert message?
We need to clear something up that might be confusing folks. Whenever you see the following (there are two forms, one for dlm_lock and another for dlm_unlock): lock_dlm: Assertion failed on line 352 of file cluster/gfs-kernel/src/dlm/lock.c lock_dlm: assertion: "!error" lock_dlm: time = 38903631 a: error=-22 num=2,70650 realize that you don't really know anything useful about what went wrong yet. This assert/panic is not the real problem, but just a signal that something else went wrong earlier in the dlm. I know it's simpler to panic right when something goes wrong, but our approach with the dlm has been different. We tend not to panic in the dlm but instead print an error message and return the error to the caller. That means that nearly anything that goes wrong in the dlm will end up returning an error to lock_dlm which does this assert [1]. So, when you get a panic like the one above, you'll need to scroll back a bit to identify the real problem. Capture the dlm debug log dump (it's pretty short and comes before the lock_dlm debug dump), and prior to the debug dumps, there are often errors/warnings that were printed to the console. Thanks, this should help us resolve these bz's a lot quicker.
this is a duplicate of bug 139738 so should also be fixed
closing this as the original problem is fixed and all issues in the last comments are tracked in 139738.