Bug 1244165

Summary: [RHEV-RHGS] App VMs paused due to IO error caused by split-brain, after initiating remove-brick operation
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: distributeAssignee: Ravishankar N <ravishankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-bugs, nbalacha, ravishankar, rcyriac, rgowdapp, sasundar, ssampat
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1243542
: 1245934 (view as bug list) Environment:
RHEL 6.7 as Hypervisor RHEVM 3.5.4 RHGS 3.1 Nighty build ( based on RHEL 7.1 )
Last Closed: 2016-06-16 13:24:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1243542    
Bug Blocks: 1245934    

Description Ravishankar N 2015-07-17 10:58:14 UTC
+++ This bug was initially created as a clone of Bug #1243542 +++

Description of problem:
------------------------
With 8X2 distributed replicate volume, initiated a remove-brick with data migration. After few minutes, all the application VMs with its disk image on that gluster volume went in to paused state.

Noticed the split-brain error message in the fuse mount log

Version
--------
RHEL 6.7 as hypervisor
RHGS 3.1 based on RHEL 7.1

How reproducible:
-----------------
Tried only once

Steps to Reproduce:
-------------------
1. Create a 2X2 distributed replicate volume
2. Use this gluster volume as the 'Data Domain' for RHEV
3. Create few App VMs and install OS
4. Remove the bricks where the disk image of App VMs are residing

Actual results:
----------------
App VMs went in to **paused** state

Expected results:
-----------------
App VMs should be healthy

--- Additional comment from SATHEESARAN on 2015-07-15 14:17:43 EDT ---

Following error messages are seen in the fuse mount logs:


[2015-07-15 17:49:42.709088] E [MSGID: 114031] [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 6-vol1-client-0: remote operation failed [Transport endpoint is not connected]

[2015-07-15 17:49:42.710849] W [MSGID: 114031] [client-rpc-fops.c:1028:client3_3_fsync_cbk] 6-vol1-client-0: remote operation failed [Transport endpoint is not connected]
[2015-07-15 17:49:42.710874] W [MSGID: 108035] [afr-transaction.c:1614:afr_changelog_fsync_cbk] 6-vol1-replicate-0: fsync(b7d21675-6fd8-472a-b7d9-71d7436c614d) failed on subvolume vol1-client-0. Transaction was WRITE [Transport endpoint is not connected]
[2015-07-15 17:49:42.710897] W [MSGID: 108001] [afr-transaction.c:686:afr_handle_quorum] 6-vol1-replicate-0: b7d21675-6fd8-472a-b7d9-71d7436c614d: Failing WRITE as quorum is not met

[2015-07-15 18:12:15.544061] E [MSGID: 108008] [afr-transaction.c:1984:afr_transaction] 12-vol1-replicate-5: Failing WRITE on gfid b7d21675-6fd8-472a-b7d9-71d7436c614d: split-brain observed. [Input/output error]
[2015-07-15 18:12:15.737906] W [fuse-bridge.c:2273:fuse_writev_cbk] 0-glusterfs-fuse: 293197: WRITE => -1 (Input/output error)
[2015-07-15 18:12:17.022070] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 12-vol1-client-5: remote operation failed. Path: /c29ec775-c933-4109-87bf-0b7c4373d0a0/images/9ddffb02-b804-4f28-a8fb-df609eaa884a/c7637ade-9c78-4bd7-a9e4-a14913f9060b (d83a3f9a-7625-4872-b61f-0e4b63922a75) [No such file or directory]
[2015-07-15 18:12:17.022073] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 12-vol1-client-4: remote operation failed. Path: /c29ec775-c933-4109-87bf-0b7c4373d0a0/images/9ddffb02-b804-4f28-a8fb-df609eaa884a/c7637ade-9c78-4bd7-a9e4-a14913f9060b (d83a3f9a-7625-4872-b61f-0e4b63922a75) [No such file or directory]
[2015-07-15 18:12:22.952290] W [fuse-bridge.c:2273:fuse_writev_cbk] 0-glusterfs-fuse: 293304: WRITE => -1 (Input/output error)
[2015-07-15 18:12:22.952550] W [fuse-bridge.c:2273:fuse_writev_cbk] 0-glusterfs-fuse: 293306: WRITE => -1 (Input/output error)


--- Additional comment from Ravishankar N on 2015-07-16 07:33:59 EDT ---

Able to reproduce the issue with running  a continuous  `dd ` into a file from fuse mount on a 2x2 volume and reducing it to a 1x2, making sure to remove the replica pair in which file resides. dd terminated with EIO. 

[root@vm2 fuse_mnt]# dd if=/dev/urandom of=file
dd: writing to ‘file’: Input/output error
dd: closing output file ‘file’: Input/output error
[root@vm2 fuse_mnt]# 


The EIO is returned by afr_transaction() which is not able to find a readable subvolume for the inode. I need to debug further to see why.

FWIW, there was no data corruption/loss and the migration completed successfully. New reads/writes to the file was successful.

[root@vm2 fuse_mnt]# echo append>>file
[root@vm2 fuse_mnt]# echo $?
0
[root@vm2 fuse_mnt]# tail -1 file
��_�d�!��aappend
[root@vm2 fuse_mnt]# 
[root@vm2 fuse_mnt]# echo $?
0

-

Comment 1 Anand Avati 2015-07-17 10:59:43 UTC
REVIEW: http://review.gluster.org/11713 (dht: send lookup even for fd based operations during rebalance) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 2 Anand Avati 2015-07-17 17:04:54 UTC
REVIEW: http://review.gluster.org/11713 (dht: send lookup even for fd based operations during rebalance) posted (#2) for review on master by Ravishankar N (ravishankar)

Comment 3 Anand Avati 2015-07-19 09:24:44 UTC
REVIEW: http://review.gluster.org/11713 (dht: send lookup even for fd based operations during rebalance) posted (#3) for review on master by Ravishankar N (ravishankar)

Comment 4 Anand Avati 2015-07-23 06:45:22 UTC
COMMIT: http://review.gluster.org/11713 committed in master by Raghavendra G (rgowdapp) 
------
commit 94372373ee355e42dfe1660a50315adb4f019d64
Author: Ravishankar N <ravishankar>
Date:   Fri Jul 17 16:04:01 2015 +0530

    dht: send lookup even for fd based operations during rebalance
    
    Problem:
    dht_rebalance_inprogress_task() was not sending lookups to the
    destination subvolume for a file undergoing writes during rebalance. Due to
    this, afr was not able to populate the read_subvol and failed the write
    with EIO.
    
    Fix:
    Send lookup for fd based operations as well.
    
    Thanks to Raghavendra G for helping with the RCA.
    
    Change-Id: I638c203abfaa45b29aa5902ffd76e692a8212a19
    BUG: 1244165
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/11713
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 5 Nagaprasad Sathyanarayana 2015-10-25 15:09:49 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 6 Niels de Vos 2016-06-16 13:24:44 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user