Bug 1707195
Summary: | VM stuck in a shutdown because of a pending fuse request | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Raghavendra G <rgowdapp> | |
Component: | write-behind | Assignee: | bugs <bugs> | |
Status: | CLOSED UPSTREAM | QA Contact: | ||
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 6 | CC: | bugs, nravinas, rhinduja, rhs-bugs | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | 1705865 | |||
: | 1707200 (view as bug list) | Environment: | ||
Last Closed: | 2020-03-12 15:00:57 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1702686, 1705865, 1707198 | |||
Bug Blocks: | 1707200 |
Comment 1
Raghavendra G
2019-05-07 02:48:00 UTC
I do see a write request hung in write-behind. Details of write-request from state-dump: [xlator.performance.write-behind.wb_inode] path=/e5dd645f-88bb-491c-9145-38fa229cbc4d/images/8e84c1ed-48ba-4b82-9882-c96e6f260bab/29bba0a1-6c7b-4358-9ef2-f8080405778d inode=0x7f6e40060888 gfid=6348d15d-7b17-4993-9da9-3f588c2ad5a8 window_conf=1048576 window_current=0 transit-size=0 dontsync=0 [.WRITE] unique=5518502 refcount=1 wound=no generation-number=0 req->op_ret=131072 req->op_errno=0 sync-attempts=0 sync-in-progress=no size=131072 offset=4184756224 lied=0 append=0 fulfilled=0 go=0 I'll go through this and will try to come up with an RCA. --- Additional comment from Raghavendra G on 2019-04-29 07:21:50 UTC --- There is a race in the way O_DIRECT writes are handled. Assume two overlapping write requests w1 and w2. * w1 is issued and is in wb_inode->wip queue as the response is still pending from bricks. Also wb_request_unref in wb_do_winds is not yet invoked. list_for_each_entry_safe (req, tmp, tasks, winds) { list_del_init (&req->winds); if (req->op_ret == -1) { call_unwind_error_keep_stub (req->stub, req->op_ret, req->op_errno); } else { call_resume_keep_stub (req->stub); } wb_request_unref (req); } * w2 is issued and wb_process_queue is invoked. w2 is not picked up for winding as w1 is still in wb_inode->wip. w1 is added to todo list and wb_writev for w2 returns. * response to w1 is received and invokes wb_request_unref. Assume wb_request_unref in wb_do_winds (see point 1) is not invoked yet. Since there is one more refcount, wb_request_unref in wb_writev_cbk of w1 doesn't remove w1 from wip. * wb_process_queue is invoked as part of wb_writev_cbk of w1. But, it fails to wind w2 as w1 is still in wip. * wb_requet_unref is invoked on w1 as part of wb_do_winds. w1 is removed from all queues including w1. * After this point there is no invocation of wb_process_queue unless new request is issued from application causing w2 to be hung till the next request. This bug is similar to bz 1626780 and bz 1379655. Though the issue is similar, fixes to these to bzs won't fix the current bug and hence this bug is not a duplicate. This bug will require a new fix and I'll post a patch to gerrit shortly. REVIEW: https://review.gluster.org/22668 (performance/write-behind: remove request from wip list in wb_writev_cbk) posted (#1) for review on release-6 by Raghavendra G This bug is moved to https://github.com/gluster/glusterfs/issues/1080, and will be tracked there from now on. Visit GitHub issues URL for further details |