Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1387501 - Asynchronous Unsplit-brain still causes Input/Output Error on system calls
Asynchronous Unsplit-brain still causes Input/Output Error on system calls
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.2
x86_64 Linux
unspecified Severity high
: ---
: RHGS 3.2.0
Assigned To: Ravishankar N
nchilaka
: Triaged
Depends On: 1378547
Blocks: 1351528 1386188 1403121
  Show dependency treegraph
 
Reported: 2016-10-21 02:09 EDT by Ravishankar N
Modified: 2017-03-23 02:13 EDT (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-6
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1378547
Environment:
Last Closed: 2017-03-23 02:13:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 05:18:45 EDT

  None (edit)
Description Ravishankar N 2016-10-21 02:09:28 EDT
+++ This bug was initially created as a clone of Bug #1378547 +++

Description of problem:

The unsplit-brain mechanism is triggered along the self-healing mechanism. Since the self-healing mechanism is asynchronous, so is the unsplit-brain mechanism. Therefore, even tough the split-brain is resolved eventually, all system calls made before this happens causes an IOE to occur. This pushes the responsibility back to the client application, which needs to retry the system call, which in turn cause a waste of resources.

The self-heal mechanism should still be asynchronous, but the right version of the favorite child policy should be resolved synchronously to prevent the Input/Output exception to occur.

Version-Release number of selected component (if applicable):
3.8.4-1

How reproducible:
Create a split-brained file and assert that the first read still always causes an Input/Output Error.

Steps to Reproduce:
1. Set cluster.entry-self-heal to on, cluster.data-self-heal to on, cluster.metadata-self-heal to on and cluster.favorite-child-policy to mtime
2. Create a split-brained file
3. Cat the split-brained file -> Ensure that an Input/Output Error is raised
4. Cat the file again ~1sec later -> Ensure that the file was healed

Actual results:
[root@host vol]# cat test
cat: test: Input/output error
[root@host vol]# cat test
[root@host vol]#

Expected results:
[root@host vol]# cat test
[root@host vol]#


Additional info:
Comment 4 Ravishankar N 2016-11-07 20:10:37 EST
Upstream patch: http://review.gluster.org/#/c/15673/4
Comment 5 Ravishankar N 2016-11-28 03:49:28 EST
Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/91354/
Comment 7 nchilaka 2017-01-16 05:35:50 EST
on_qa verification:



steps run:
1. Set cluster.entry-self-heal to on, cluster.data-self-heal to on, cluster.metadata-self-heal to on and cluster.favorite-child-policy to mtime
2. Create a split-brained file
3. Cat the split-brained file -> Ensure that an Input/Output Error is raised
4. Cat the file again ~1sec later -> Ensure that the file was healed

the files are getting healed based on latest mtime
hence moving to verified

test version:3.8.4-11
Comment 9 errata-xmlrpc 2017-03-23 02:13:44 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.