Bug 1393758
Summary: | I/O errors on FUSE mount point when reading and writing from 2 clients | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Karan Sandha <ksandha> | ||||
Component: | md-cache | Assignee: | Poornima G <pgurusid> | ||||
Status: | CLOSED ERRATA | QA Contact: | Karan Sandha <ksandha> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rhgs-3.2 | CC: | pkarampu, rhinduja, rhs-bugs, rjoseph, storage-qa-internal, vdas | ||||
Target Milestone: | --- | ||||||
Target Release: | RHGS 3.2.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.8.4-6 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1396952 (view as bug list) | Environment: | |||||
Last Closed: | 2017-03-23 06:18:16 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1351528, 1396952, 1399446 | ||||||
Attachments: |
|
Description
Karan Sandha
2016-11-10 09:40:58 UTC
Poornima and Ravi, Updating the bug with new findings as asked. This bug is also hitting on 1x3 replica also. Tested it on:- [root@dhcp47-141 /]# gluster --version glusterfs 3.8.4 built on Oct 24 2016 11:13:47 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. Its a good finding. RCA: When a brick is brought down, the event generation in afr changes, and the next read call afr_inode_refresh to choose a read subvolume. As a result of brick down, pending xattr is sent on the bricks that are up, which results in an upcall to the afr to reset the read subvolume. Consider a race between, afr_inode_Refresh_done() and upcall unsetting the read subvol to NULL. Below is the sequence of execution that can lead to EIO: 1. CHILD_DOWN - as a result of brick down ... 2. read() .... 3. afr_read() 4. afr_refresh_inode() - Because of brick down event generation has changed and inode needs refresh before read fop. 5. afr_refresh_done() 6. upcall - reset the read_subvol 7. afr_read_txn_refresh_done() In the above case, the read will fail with EIO error. The fix may either go in AFR/md-cache, need to conclude Fix posted upstream: http://review.gluster.org/#/c/15892/ Creating deep directories & simultaneously running ll command on mount point on a continuous i am not seeing any IO error with glusterfs-3.8.4-6. Verified this bug on 3.8.4.6 with the same steps in description its not reproducible. Hence marking it as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |