Bug 1636902
Summary: | "gluster vol heal <vol name> info" is hung on Distributed-Replicated ( Arbiter ) | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Vijay Avuthu <vavuthu> | |
Component: | arbiter | Assignee: | Ravishankar N <ravishankar> | |
Status: | CLOSED ERRATA | QA Contact: | Vijay Avuthu <vavuthu> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | rhgs-3.4 | CC: | anepatel, apaladug, atumball, bkunal, bmekala, dwojslaw, nbalacha, nchilaka, pkarampu, rcyriac, rhs-bugs, sanandpa, sankarshan, sheggodu, storage-qa-internal, vavuthu | |
Target Milestone: | --- | Keywords: | Automation, ZStream | |
Target Release: | RHGS 3.4.z Batch Update 1 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.12.2-23 | Doc Type: | Bug Fix | |
Doc Text: |
Previously a flaw in the self-heal code caused an inode-lock to be taken twice on the file that needed heal but released only once. Due to this, a stale lock was left behind on the brick, causing further operations(like heal or write from the client) that needed the lock to be hung. With this update, the inode locks are released accurately without leaving behind any stale locks in the brick. This prevents further heals or writes from the client from experiencing a hang.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1638026 (view as bug list) | Environment: | ||
Last Closed: | 2018-10-31 08:46:58 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1637802, 1637953, 1637989, 1638159 | |||
Bug Blocks: | 1638026 |
Description
Vijay Avuthu
2018-10-08 08:39:33 UTC
Also note, this was hit by a User in community, and would surely propose it as 'Blocker', as this is a common activity in all scenarios, OCS, RHHI, RHGS. Vijay was telling me that manual testing did not find any issues with the scratch build. While he is running the automated tests, moving the BZ to POST. Upstream patch link is https://review.gluster.org/#/c/21380/ *** Bug 1638947 has been marked as a duplicate of this bug. *** Can this be prevented by using 'sdfs' feature? (serializing directory entry ops) ? Verified the fix on build glusterfs-libs-3.12.2-23.el7rhgs.x86_64. The is no heal hang issue observed anymore, but the heal is pending and is tracked in bug: 1640148, hence setting this bz to verified state. (In reply to Amar Tumballi from comment #17) > Can this be prevented by using 'sdfs' feature? (serializing directory entry > ops) ? Without serializing all entry ops irrespective of the parent-directory on which the fop comes, I don't think it is possible. But doing this will lead to very bad performance. So at the moment I will try to fix it in AFR/EC as the xlators are doing things that posix is not well equipped to do. Vijay, For all upgrade/healing tests, can we have an extra step after each upgrade completes, where we add a fresh mount and a way to create a new file in existing directory and add data to existing files? This is the only way to ensure that this bug doesn't repeat in future. Pranith *** Bug 1635967 has been marked as a duplicate of this bug. *** (In reply to Pranith Kumar K from comment #21) > Vijay, > For all upgrade/healing tests, can we have an extra step after each > upgrade completes, where we add a fresh mount and a way to create a new file > in existing directory and add data to existing files? This is the only way > to ensure that this bug doesn't repeat in future. > > Pranith sure pranith. We include that in our upgrade testing. (In reply to Vijay Avuthu from comment #23) > (In reply to Pranith Kumar K from comment #21) > > Vijay, > > For all upgrade/healing tests, can we have an extra step after each > > upgrade completes, where we add a fresh mount and a way to create a new file > > in existing directory and add data to existing files? This is the only way > > to ensure that this bug doesn't repeat in future. > > > > Pranith > > sure pranith. We include that in our upgrade testing. Forgot to mention, even for EC volumes the tests should be modified in similar fashion. Changing doc text to be identical to BZ 1638026 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3432 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |