Bug 1378547
Summary: | Asynchronous Unsplit-brain still causes Input/Output Error on system calls | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Simon Turcotte-Langevin <simon.turcotte-langevin> | |
Component: | replicate | Assignee: | Ravishankar N <ravishankar> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.8 | CC: | bugs, pkarampu, ravishankar, simon.turcotte-langevin | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.8 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1386188 1387501 (view as bug list) | Environment: | ||
Last Closed: | 2017-01-16 12:26:06 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1386188, 1387501, 1403121, 1403577 |
Description
Simon Turcotte-Langevin
2016-09-22 17:33:05 UTC
REVIEW: http://review.gluster.org/16091 (afr: allow I/O when favorite-child-policy is enabled) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar) Hello Simon, Would it be possible for you to test the patch (Comment #1) and see if you find any problems with it? The 3.8 maintainer is concerned about taking the patch in since it is relatively large: http://www.gluster.org/pipermail/maintainers/2016-December/001866.html Thanks, Ravi Hello Ravi, Firstly, thank you very much for the efforts on this issue, it is much appreciated. We will be testing the patch on the latest 3.8 sources and we will execute our benchmark to see if there's any issues. We will also test whether synchronous heals happen as expected. Thanks, Simon Hello again Ravi, We're currently testing 3.8.5-1 with the patch you've given us applied to it. When all self-heals are on, the file is unplit-brain synchronously as expected. However, if the self-heal is set to off, and the favorite child policy is set, then there's a deadlock ocurring. Steps to reproduce: 1) gluster volume set vol cluster.entry-self-heal off gluster volume set vol cluster.data-self-heal off gluster volume set vol cluster.metadata-self-heal off gluster volume set vol cluster.favorite-child-policy mtime 2) node 1: setfattr --name=trusted.afr.gv0-client-0 --value=0x100000000000000000000000 /brick/test setfattr --name=trusted.afr.gv0-client-1 --value=0x000000000000000000000000 /brick/test setfattr --name=trusted.afr.gv0-client-2 --value=0x000000000000000000000000 /brick/test 3) node 2: setfattr --name=trusted.afr.gv0-client-0 --value=0x000000000000000000000000 /brick/test setfattr --name=trusted.afr.gv0-client-1 --value=0x010000000000000000000000 /brick/test setfattr --name=trusted.afr.gv0-client-2 --value=0x000000000000000000000000 /brick/test 4) node 3: setfattr --name=trusted.afr.gv0-client-0 --value=0x000000000000000000000000 /brick/test setfattr --name=trusted.afr.gv0-client-1 --value=0x000000000000000000000000 /brick/test setfattr --name=trusted.afr.gv0-client-2 --value=0x001000000000000000000000 /brick/test 5) cat /vol/test # Never resolves. Expected: a) Unsplitbrain mechanism triggers and returns right version or; b) Unsplitbrain mechanism does not trigger because self heal is toggled off For our use case, if possible, a) is preffered. This might not be possible however, and b) should be honored. Thanks, Simon Hi Simon, Thanks a lot for testing! While the steps you described does cause hang due to an infinite inode-refresh loop, the values you set for xattrs on the back end is not a valid scenario. You have set them in such a way that each brick blames itself (i.e trusted.afr.gv0-client-0 for the 1st brick, trusted.afr.gv0-client-1 for the 2nd brick etc). This is not possible in AFR-v2 (i.e. glusterfs-3.6 onwards), where each brick can have xattrs only blaming the other brick if some I/O fails. You could retest by setting something like this: (1) 1st brick: set trusted.afr.gv0-client-1 and trusted.afr.gv0-client-2 2nd brick: set trusted.afr.gv0-client-0 and trusted.afr.gv0-client-2 3rd brick: set trusted.afr.gv0-client-0 and trusted.afr.gv0-client-1. Then things should work. (2) Alternatively, you can also bring bricks up/down while I/O is going on. But for replica-3 it is difficult to cause split-brain by the up/down method (works fine for replica 2 ). I'm leaving a need-info for you to test and see if you find any issues with approaches (1) or (2). Also, if you are able to hit the state where each brick blames itself (like in comment #4) without manually setting the xattrs, please raise a bug for it. Thanks again, Ravi Hello Ravishankar, This is very interesting indeed, thanks for the insight on how the mechanism work under the hood. I've tested it with the right xattr, and it works perfectly, without the need for self-healing options. This is exactly the behaviour we were looking for. As for the tests, I've sent you an email that details issues that we've found so far. However, the backport patch for this fix does not seem to arise any additional issues for 3.8. Thanks, Simon Many thanks for the testing, Simon! I've marked this to be included in the next (3.8.8) minor release. The release is targeted for ~10th of January. REVIEW: http://review.gluster.org/16091 (afr: allow I/O when favorite-child-policy is enabled) posted (#2) for review on release-3.8 by Ravishankar N (ravishankar) REVIEW: http://review.gluster.org/16322 (afr: Ignore event_generation checks post inode refresh for write txns) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar) COMMIT: http://review.gluster.org/16091 committed in release-3.8 by Niels de Vos (ndevos) ------ commit c539e23023abe743770287439ebe81989a732728 Author: Ravishankar N <ravishankar> Date: Fri Dec 9 07:14:17 2016 +0000 afr: allow I/O when favorite-child-policy is enabled Problem: Currently, I/O on a split-brained file fails even when the favorite-child-policy is set until the self-heal is complete. Fix: If a valid 'source' is found using the set favorite-child-policy, inspect and reset the afr pending xattrs on the 'sinks' (inside appropriate locks), refresh the inode and then proceed with the read or write transaction. The resetting itself happens in the self-heal code and hence can also happen in the client side background-heal or by the shd's index-heal in addition to the txn code path explained above. When it happens in via heal, we also add checks in undo-pending to not reset the sink xattrs again. > Reviewed-on: http://review.gluster.org/15673 > Tested-by: Pranith Kumar Karampuri <pkarampu> > Smoke: Gluster Build System <jenkins.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> Change-Id: Ic8c1317720cb26bd114b6fe6af4e58c73b864626 BUG: 1378547 Signed-off-by: Ravishankar N <ravishankar> Reported-by: Simon Turcotte-Langevin <simon.turcotte-langevin> Reviewed-on: http://review.gluster.org/16091 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Niels de Vos <ndevos> REVIEW: http://review.gluster.org/16322 (afr: Ignore event_generation checks post inode refresh for write txns) posted (#2) for review on release-3.8 by Niels de Vos (ndevos) COMMIT: http://review.gluster.org/16322 committed in release-3.8 by Niels de Vos (ndevos) ------ commit 268a1c1100ca661095d5606d0248e038bdbefd49 Author: Ravishankar N <ravishankar> Date: Wed Jan 4 17:21:35 2017 +0530 afr: Ignore event_generation checks post inode refresh for write txns Backport of http://review.gluster.org/#/c/16205/ Before http://review.gluster.org/#/c/16091/, after inode refresh, we failed read txns in case of EIO or event_generation being zero. For write transactions, the check was only for EIO. 16091 re-factored the code to fail both read and write when event_generation=0. This seems to have caused a regression as explained in the BZ. This patch restores that behaviour in afr_txn_refresh_done(). Change-Id: Id763ed2d420b6d045d4505893a18959d998c91a3 BUG: 1378547 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/16322 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Niels de Vos <ndevos> Smoke: Gluster Build System <jenkins.org> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.8, please open a new bug report. glusterfs-3.8.8 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2017-January/000064.html [2] https://www.gluster.org/pipermail/gluster-users/ |