+++ This bug was initially created as a clone of Bug #1378547 +++ Description of problem: The unsplit-brain mechanism is triggered along the self-healing mechanism. Since the self-healing mechanism is asynchronous, so is the unsplit-brain mechanism. Therefore, even tough the split-brain is resolved eventually, all system calls made before this happens causes an IOE to occur. This pushes the responsibility back to the client application, which needs to retry the system call, which in turn cause a waste of resources. The self-heal mechanism should still be asynchronous, but the right version of the favorite child policy should be resolved synchronously to prevent the Input/Output exception to occur. Version-Release number of selected component (if applicable): 3.8.4-1 How reproducible: Create a split-brained file and assert that the first read still always causes an Input/Output Error. Steps to Reproduce: 1. Set cluster.entry-self-heal to on, cluster.data-self-heal to on, cluster.metadata-self-heal to on and cluster.favorite-child-policy to mtime 2. Create a split-brained file 3. Cat the split-brained file -> Ensure that an Input/Output Error is raised 4. Cat the file again ~1sec later -> Ensure that the file was healed Actual results: [root@host vol]# cat test cat: test: Input/output error [root@host vol]# cat test [root@host vol]# Expected results: [root@host vol]# cat test [root@host vol]# Additional info:
Upstream patch: http://review.gluster.org/#/c/15673/4
Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/91354/
on_qa verification: steps run: 1. Set cluster.entry-self-heal to on, cluster.data-self-heal to on, cluster.metadata-self-heal to on and cluster.favorite-child-policy to mtime 2. Create a split-brained file 3. Cat the split-brained file -> Ensure that an Input/Output Error is raised 4. Cat the file again ~1sec later -> Ensure that the file was healed the files are getting healed based on latest mtime hence moving to verified test version:3.8.4-11
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html