+++ This bug was initially created as a clone of Bug #1404982 +++ Description of problem: In a arbiter volume when one of the data brick is killed and start writing I/O i see that vm goes to paused state and following is seen in the mount logs. [2016-12-15 09:47:16.357700] E [MSGID: 108008] [afr-transaction.c:2557:afr_write_txn_refresh_done] 0-data-replicate-0: Failing FXATTROP on gfid 883a5c0a-e16e-4937-83b5-5d90d f1ec956: split-brain observed. [2016-12-15 09:47:16.357724] E [MSGID: 133016] [shard.c:631:shard_update_file_size_cbk] 0-data-shard: Update to file size xattr failed on 883a5c0a-e16e-4937-83b5-5d90df1ec95 6 [Input/output error] [2016-12-15 09:47:16.357998] W [fuse-bridge.c:2312:fuse_writev_cbk] 0-glusterfs-fuse: 15170: WRITE => -1 gfid=883a5c0a-e16e-4937-83b5-5d90df1ec956 fd=0x7fd0f000f0f8 (Input/o utput error) Version-Release number of selected component (if applicable): glusterfs-3.8.4-8.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. Install HC with three nodes. 2. Inside the vm mount the disk and start writing I/O 3. while I/O is going on kill one of the brick. Actual results: I see that vm goes to paused state with Input/output error. Expected results: vm should not go to paused state as only one of the data brick is down. Additional info: Following is seen in the mount logs: =========================================== [2016-12-15 09:47:16.357700] E [MSGID: 108008] [afr-transaction.c:2557:afr_write_txn_refresh_done] 0-data-replicate-0: Failing FXATTROP on gfid 883a5c0a-e16e-4937-83b5-5d90d f1ec956: split-brain observed. [2016-12-15 09:47:16.357724] E [MSGID: 133016] [shard.c:631:shard_update_file_size_cbk] 0-data-shard: Update to file size xattr failed on 883a5c0a-e16e-4937-83b5-5d90df1ec95 6 [Input/output error] [2016-12-15 09:47:16.357998] W [fuse-bridge.c:2312:fuse_writev_cbk] 0-glusterfs-fuse: 15170: WRITE => -1 gfid=883a5c0a-e16e-4937-83b5-5d90df1ec956 fd=0x7fd0f000f0f8 (Input/o utput error) Additional comment from Ravishankar N on 2016-12-19 11:25:20 EST --- Thanks Kasturi for providing the setup for testing and thanks Satheesaran for providing virsh based commands for re-creating the issue. The isse is due to a race between inode_refresh_done() and __afr_set_in_flight_sb_status() that occurs when I/O is going on and a brick is brought down or up. When the brick goes up/ comes down, inode refresh is triggered in the write transaction and sets the correct data/metadata readable and event_generation in inode_refresh_done(). But before it can proceed to the write FOP, __afr_set_in_flight_sb_status() from another writev cbk resets the event_generation. When the first write (that follows the inode refresh) gets the event_gen in afr_inode_get_readable(), it gets zero because of which it fails the write with EIO. While ignoring event_generation seems to fix the issue ----------------------------------------------------------- diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c index 60bae18..2f32e44 100644 --- a/xlators/cluster/afr/src/afr-common.c +++ b/xlators/cluster/afr/src/afr-common.c @@ -1089,7 +1089,7 @@ afr_txn_refresh_done (call_frame_t *frame, xlator_t *this, int err) &event_generation, local->transaction.type); - if (ret == -EIO || !event_generation) { + if (ret == -EIO){ /* No readable subvolume even after refresh ==> splitbrain.*/ if (!priv->fav_child_policy) { err = -EIO; ----------------------------------------------------------- I need to convince myself that ignoring event gen in afr_txn_refresh_done() is for reads (there is no prob in ignoring it for writes)does not have any repercussions.
Before http://review.gluster.org/#/c/15673/, after inode refresh, we failed read txns in case of EIO or event_generation being zero. For write transactions, the check was only for EIO. 15673 re-factored the code to fail both read and write when event_generation=0. This seems to have caused a regression as explained in the description above. While we could restore the above behaviour, it seems we don't need to check event_gen value for read transactions as well because it could very well happen that the event_gen could be set to zero after we checked (post inode refresh) for it to be non zero but just before we did a stack wind for that read txn. Send a patch to see if this breaks any upstream regression test.
REVIEW: http://review.gluster.org/16205 (afr: Ignore event_generation checks post inode refresh) posted (#1) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/16205 (afr: Ignore event_generation checks post inode refresh) posted (#2) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/16205 (afr: Ignore event_generation checks post inode refresh) posted (#3) for review on master by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/16205 (afr: Ignore event_generation checks post inode refresh for write txns) posted (#4) for review on master by Ravishankar N (ravishankar)
COMMIT: http://review.gluster.org/16205 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 7ee998b9041d594d93a4e2ef369892c185e80def Author: Ravishankar N <ravishankar> Date: Tue Dec 20 07:05:02 2016 +0530 afr: Ignore event_generation checks post inode refresh for write txns Before http://review.gluster.org/#/c/15673/, after inode refresh, we failed read txns in case of EIO or event_generation being zero. For write transactions, the check was only for EIO. 15673 re-factored the code to fail both read and write when event_generation=0. This seems to have caused a regression as explained in the BZ. This patch restores that behaviour in afr_txn_refresh_done(). Change-Id: Ib8e116506badce6f58b55827dbe403d95069d744 BUG: 1406224 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/16205 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/