Bug 1393758

Summary: I/O errors on FUSE mount point when reading and writing from 2 clients
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Karan Sandha <ksandha>
Component: md-cacheAssignee: Poornima G <pgurusid>
Status: CLOSED ERRATA QA Contact: Karan Sandha <ksandha>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: pkarampu, rhinduja, rhs-bugs, rjoseph, storage-qa-internal, vdas
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1396952 (view as bug list) Environment:
Last Closed: 2017-03-23 06:18:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528, 1396952, 1399446    
Attachments:
Description Flags
ERROR and warning none

Description Karan Sandha 2016-11-10 09:40:58 UTC
Created attachment 1219268 [details]
ERROR and warning

Description of problem:
Read a file from one client and write from other client. After a while we are seeing IO errors on mount points and *ERRORS* and *WARNINGS* on Mount Logs

Version-Release number of selected component (if applicable):
gluster --version
glusterfs 3.8.4 built on Oct 24 2016 11:13:47
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

How reproducible:
100%
Logs placed at rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>
Steps to Reproduce:
1. Create an arbiter Volume(Casey) and mount it on two different clients /mnt/casey and enable the mdcache options 


2. One one of the clients  while true; do cat abc; done; and on the other
 while true; do echo 21 > abc ; done;

3. after 1few minutes you see I/O errors on mount Point and below errors and warning on the mount log.

[2016-11-10 08:02:54.878847] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-casey-replicate-0: Failing READ on gfid e739b56e-ce19-4056-9795-6b03681654b5: split-brain observed. [Input/output error]

[2016-11-10 08:02:54.881998] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 516544: READ => -1 gfid=e739b56e-ce19-4056-9795-6b03681654b5 fd=0x7f3b2430c06c (Input/output error)

Actual results:
The I/O error s observed on the mount point. and mount logs showing split-brain observed ERROR

Expected results:
IO should run smoothly and no errors should be observed.

Additional info:
root@dhcp47-141 ~]# gluster volu info
 
Volume Name: casey
Type: Replicate
Volume ID: 919084d3-561f-4874-be74-e349ad0b23a5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: dhcp47-141.lab.eng.blr.redhat.com:/bricks/brick0/casey
Brick2: dhcp47-143.lab.eng.blr.redhat.com:/bricks/brick0/casey
Brick3: dhcp47-144.lab.eng.blr.redhat.com:/bricks/brick0/casey (arbiter)
Options Reconfigured:
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

Comment 2 Karan Sandha 2016-11-11 10:23:36 UTC
Poornima and Ravi,

Updating the bug with new findings as asked. This bug is also hitting on 1x3 replica also.

Tested it on:-  

[root@dhcp47-141 /]# gluster --version
glusterfs 3.8.4 built on Oct 24 2016 11:13:47
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

Comment 3 Poornima G 2016-11-14 11:39:35 UTC
Its a good finding.

RCA:
When a brick is brought down, the event generation in afr changes, and the next read call afr_inode_refresh to choose a read subvolume. As a result of brick down, pending xattr is sent on the bricks that are up, which results in an upcall to the afr to reset the read subvolume.
Consider a race between, afr_inode_Refresh_done() and upcall unsetting the read subvol to NULL. Below is the sequence of execution that can lead to EIO:

1. CHILD_DOWN - as a result of brick down
...
2. read()
....
3. afr_read()
4. afr_refresh_inode() - Because of brick down event generation has changed and inode needs refresh before read fop.
5. afr_refresh_done()
6. upcall - reset the read_subvol
7. afr_read_txn_refresh_done()

In the above case, the read will fail with EIO error.

The fix may either go in AFR/md-cache, need to conclude

Comment 6 Poornima G 2016-11-21 09:37:00 UTC
Fix posted upstream:  http://review.gluster.org/#/c/15892/

Comment 10 Vivek Das 2016-12-01 15:37:39 UTC
Creating deep directories & simultaneously running ll command on mount point on a continuous i am not seeing any IO error with glusterfs-3.8.4-6.

Comment 11 Karan Sandha 2016-12-05 10:02:46 UTC
Verified this bug on 3.8.4.6 with the same steps in description its not reproducible. Hence marking it as verified.

Comment 13 errata-xmlrpc 2017-03-23 06:18:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html