Bug 1387501 - Asynchronous Unsplit-brain still causes Input/Output Error on system calls
Summary: Asynchronous Unsplit-brain still causes Input/Output Error on system calls
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Ravishankar N
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On: 1378547
Blocks: 1351528 1386188 1403121
TreeView+ depends on / blocked
 
Reported: 2016-10-21 06:09 UTC by Ravishankar N
Modified: 2017-03-23 06:13 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1378547
Environment:
Last Closed: 2017-03-23 06:13:44 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Ravishankar N 2016-10-21 06:09:28 UTC
+++ This bug was initially created as a clone of Bug #1378547 +++

Description of problem:

The unsplit-brain mechanism is triggered along the self-healing mechanism. Since the self-healing mechanism is asynchronous, so is the unsplit-brain mechanism. Therefore, even tough the split-brain is resolved eventually, all system calls made before this happens causes an IOE to occur. This pushes the responsibility back to the client application, which needs to retry the system call, which in turn cause a waste of resources.

The self-heal mechanism should still be asynchronous, but the right version of the favorite child policy should be resolved synchronously to prevent the Input/Output exception to occur.

Version-Release number of selected component (if applicable):
3.8.4-1

How reproducible:
Create a split-brained file and assert that the first read still always causes an Input/Output Error.

Steps to Reproduce:
1. Set cluster.entry-self-heal to on, cluster.data-self-heal to on, cluster.metadata-self-heal to on and cluster.favorite-child-policy to mtime
2. Create a split-brained file
3. Cat the split-brained file -> Ensure that an Input/Output Error is raised
4. Cat the file again ~1sec later -> Ensure that the file was healed

Actual results:
[root@host vol]# cat test
cat: test: Input/output error
[root@host vol]# cat test
[root@host vol]#

Expected results:
[root@host vol]# cat test
[root@host vol]#


Additional info:

Comment 4 Ravishankar N 2016-11-08 01:10:37 UTC
Upstream patch: http://review.gluster.org/#/c/15673/4

Comment 5 Ravishankar N 2016-11-28 08:49:28 UTC
Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/91354/

Comment 7 Nag Pavan Chilakam 2017-01-16 10:35:50 UTC
on_qa verification:



steps run:
1. Set cluster.entry-self-heal to on, cluster.data-self-heal to on, cluster.metadata-self-heal to on and cluster.favorite-child-policy to mtime
2. Create a split-brained file
3. Cat the split-brained file -> Ensure that an Input/Output Error is raised
4. Cat the file again ~1sec later -> Ensure that the file was healed

the files are getting healed based on latest mtime
hence moving to verified

test version:3.8.4-11

Comment 9 errata-xmlrpc 2017-03-23 06:13:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.