(gdb) p $4->locks[0] $5 = {lock = 0x7f3da4abc1d8, fop = 0x7f3d74317e18, owner_list = {next = 0x7f3d74317ed0, prev = 0x7f3d74317ed0}, wait_list = {next = 0x7f3da4abc208, prev = 0x7f3da4abc208}, update = {false, false}, dirty = { false, false}, optimistic_changelog = false, base = 0x0, size = 0, waiting_flags = 0, fl_start = 0, fl_end = 9223372036854775807} (gdb) p $4->locks[0].lock $6 = (ec_lock_t *) 0x7f3da4abc1d8 (gdb) p *$4->locks[0].lock $7 = {ctx = 0x7f3db7cbff70, timer = 0x0, owners = {next = 0x7f3da4abc1e8, prev = 0x7f3da4abc1e8}, waiting = {next = 0x7f3da4abc1f8, prev = 0x7f3da4abc1f8}, frozen = {next = 0x7f3d74317ee0, prev = 0x7f3d74317ee0}, mask = 0, good_mask = 18446744073709551615, healing = 0, refs_owners = 0, refs_pending = 0, waiting_flags = 0, acquired = false, unlock_now = false, release = true, query = true, fd = 0x0, loc = {path = 0x7f3d75084a40 "/IOs/kernel/rhs-client45.lab.eng.blr.redhat.com/dir.2/linux-5.2.7/Documentation/devicetree/bindings/rtc", name = 0x7f3d75084aa4 "rtc", inode = 0x7f3d98014768, parent = 0x7f3d99faad38, gfid = "\310\a\376|-\205K\v\215\000\b\363>\241\021i", pargfid = "\345\330}\212\242{Nr\233\064\373\030MD\361", <incomplete sequence \360>}, {type = ENTRYLK_WRLCK, flock = { l_type = 1, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}}} (gdb) p &$4->locks[0].lock->owners $8 = (struct list_head *) 0x7f3da4abc1e8 (gdb) p &$4->locks[0].lock->waiting $9 = (struct list_head *) 0x7f3da4abc1f8 (gdb) p &$4->locks[0].lock->frozen $10 = (struct list_head *) 0x7f3da4abc208 This seems to suggest that the fop is stuck in frozen list which can only happen if lock->release is set to true. Problem: Mount-1 Mount-2 1)Tries to acquire lock on 'dir1' 1)Tries to acquire lock on 'dir1' 2)Lock is granted on brick-0 2)Lock gets EAGAIN on brick-0 and leads to blocking lock on brick-0 3)Gets a lock-contention 3) Doesn't matter what happens on mount-2 notification, marks lock->release from here on. to true. 4)New fop comes on 'dir1' which will be put in frozen list as lock->release is set to true. 5) Lock acquisition from step-2 fails because 3 bricks went down in 4+2 setup. Fop on mount-1 which is put in frozen list will hang because no codepath will move it from frozen list to any other list and the lock will not be retried.
REVIEW: https://review.gluster.org/23272 (cluster/ec: Mark release only when it is acquired) posted (#1) for review on master by Pranith Kumar Karampuri
REVIEW: https://review.gluster.org/23272 (cluster/ec: Mark release only when it is acquired) merged (#5) on master by Pranith Kumar Karampuri