1743573 – fuse client hung when issued a lookup "ls" on an ec volume

Bug 1743573 - fuse client hung when issued a lookup "ls" on an ec volume

Summary: fuse client hung when issued a lookup "ls" on an ec volume

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1731896
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-20 08:56 UTC by Pranith Kumar K
Modified:	2019-09-12 06:38 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:	1731896
Environment:
Last Closed:	2019-09-12 06:38:01 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	23272	0	None	Merged	cluster/ec: Mark release only when it is acquired	2019-09-12 06:38:00 UTC

Comment 1 Pranith Kumar K 2019-08-20 08:58:46 UTC

(gdb) p $4->locks[0]
$5 = {lock = 0x7f3da4abc1d8, fop = 0x7f3d74317e18, owner_list = {next = 0x7f3d74317ed0, prev = 0x7f3d74317ed0}, wait_list = {next = 0x7f3da4abc208, prev = 0x7f3da4abc208}, update = {false, false}, dirty = { false, false}, optimistic_changelog = false, base = 0x0, size = 0, waiting_flags = 0, fl_start = 0, fl_end = 9223372036854775807}
(gdb) p $4->locks[0].lock
$6 = (ec_lock_t *) 0x7f3da4abc1d8
(gdb) p *$4->locks[0].lock
$7 = {ctx = 0x7f3db7cbff70, timer = 0x0, owners = {next = 0x7f3da4abc1e8, prev = 0x7f3da4abc1e8}, waiting = {next = 0x7f3da4abc1f8, prev = 0x7f3da4abc1f8}, frozen = {next = 0x7f3d74317ee0, prev = 0x7f3d74317ee0}, mask = 0, good_mask = 18446744073709551615, healing = 0, refs_owners = 0, refs_pending = 0, waiting_flags = 0, acquired = false, unlock_now = false, release = true, query = true, fd = 0x0, loc = {path = 0x7f3d75084a40 "/IOs/kernel/rhs-client45.lab.eng.blr.redhat.com/dir.2/linux-5.2.7/Documentation/devicetree/bindings/rtc", name = 0x7f3d75084aa4 "rtc", inode = 0x7f3d98014768, parent = 0x7f3d99faad38, gfid = "\310\a\376|-\205K\v\215\000\b\363>\241\021i", pargfid = "\345\330}\212\242{Nr\233\064\373\030MD\361", <incomplete sequence \360>}, {type = ENTRYLK_WRLCK, flock = { l_type = 1, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' <repeats 1023 times>}}}}
(gdb) p &$4->locks[0].lock->owners
$8 = (struct list_head *) 0x7f3da4abc1e8
(gdb) p &$4->locks[0].lock->waiting
$9 = (struct list_head *) 0x7f3da4abc1f8
(gdb) p &$4->locks[0].lock->frozen
$10 = (struct list_head *) 0x7f3da4abc208

This seems to suggest that the fop is stuck in frozen list which can only happen if lock->release is set to true.



    Problem:
    Mount-1                                Mount-2
    1)Tries to acquire lock on 'dir1'   1)Tries to acquire lock on 'dir1'
    2)Lock is granted on brick-0        2)Lock gets EAGAIN on brick-0 and
                                          leads to blocking lock on brick-0
    3)Gets a lock-contention            3) Doesn't matter what happens on mount-2
      notification, marks lock->release    from here on.
      to true.
    4)New fop comes on 'dir1' which will
      be put in frozen list as lock->release
      is set to true.
    5) Lock acquisition from step-2 fails because
    3 bricks went down in 4+2 setup.
    
    Fop on mount-1 which is put in frozen list will hang because no codepath will
    move it from frozen list to any other list and the lock will not be retried.

Comment 2 Worker Ant 2019-08-20 09:02:47 UTC

REVIEW: https://review.gluster.org/23272 (cluster/ec: Mark release only when it is acquired) posted (#1) for review on master by Pranith Kumar Karampuri

Comment 3 Worker Ant 2019-09-12 06:38:01 UTC

REVIEW: https://review.gluster.org/23272 (cluster/ec: Mark release only when it is acquired) merged (#5) on master by Pranith Kumar Karampuri

Note You need to log in before you can comment on or make changes to this bug.