1245934 – [RHEV-RHGS] App VMs paused due to IO error caused by split-brain, after initiating remove-brick operation

Bug 1245934 - [RHEV-RHGS] App VMs paused due to IO error caused by split-brain, after initiating remove-brick operation

Summary: [RHEV-RHGS] App VMs paused due to IO error caused by split-brain, after initi...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	3.7.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1243542 1244165
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-23 07:13 UTC by Ravishankar N
Modified:	2015-07-30 09:48 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.7.3
Clone Of:	1244165
Environment:	RHEL 6.7 as Hypervisor RHEVM 3.5.4 RHGS 3.1 Nighty build ( based on RHEL 7.1 )
Last Closed:	2015-07-30 09:48:19 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2015-07-23 07:13:26 UTC

Description of problem:
------------------------
With 8X2 distributed replicate volume, initiated a remove-brick with data migration. After few minutes, all the application VMs with its disk image on that gluster volume went in to paused state.

Noticed the split-brain error message in the fuse mount log

Version
--------
RHEL 6.7 as hypervisor
RHGS 3.1 based on RHEL 7.1

How reproducible:
-----------------
Tried only once

Steps to Reproduce:
-------------------
1. Create a 2X2 distributed replicate volume
2. Use this gluster volume as the 'Data Domain' for RHEV
3. Create few App VMs and install OS
4. Remove the bricks where the disk image of App VMs are residing

Actual results:
----------------
App VMs went in to **paused** state

Expected results:
-----------------
App VMs should be healthy

--- Additional comment from SATHEESARAN on 2015-07-15 14:17:43 EDT ---

Following error messages are seen in the fuse mount logs:


[2015-07-15 17:49:42.709088] E [MSGID: 114031] [client-rpc-fops.c:1673:client3_3_finodelk_cbk] 6-vol1-client-0: remote operation failed [Transport endpoint is not connected]

[2015-07-15 17:49:42.710849] W [MSGID: 114031] [client-rpc-fops.c:1028:client3_3_fsync_cbk] 6-vol1-client-0: remote operation failed [Transport endpoint is not connected]
[2015-07-15 17:49:42.710874] W [MSGID: 108035] [afr-transaction.c:1614:afr_changelog_fsync_cbk] 6-vol1-replicate-0: fsync(b7d21675-6fd8-472a-b7d9-71d7436c614d) failed on subvolume vol1-client-0. Transaction was WRITE [Transport endpoint is not connected]
[2015-07-15 17:49:42.710897] W [MSGID: 108001] [afr-transaction.c:686:afr_handle_quorum] 6-vol1-replicate-0: b7d21675-6fd8-472a-b7d9-71d7436c614d: Failing WRITE as quorum is not met

[2015-07-15 18:12:15.544061] E [MSGID: 108008] [afr-transaction.c:1984:afr_transaction] 12-vol1-replicate-5: Failing WRITE on gfid b7d21675-6fd8-472a-b7d9-71d7436c614d: split-brain observed. [Input/output error]
[2015-07-15 18:12:15.737906] W [fuse-bridge.c:2273:fuse_writev_cbk] 0-glusterfs-fuse: 293197: WRITE => -1 (Input/output error)
[2015-07-15 18:12:17.022070] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 12-vol1-client-5: remote operation failed. Path: /c29ec775-c933-4109-87bf-0b7c4373d0a0/images/9ddffb02-b804-4f28-a8fb-df609eaa884a/c7637ade-9c78-4bd7-a9e4-a14913f9060b (d83a3f9a-7625-4872-b61f-0e4b63922a75) [No such file or directory]
[2015-07-15 18:12:17.022073] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 12-vol1-client-4: remote operation failed. Path: /c29ec775-c933-4109-87bf-0b7c4373d0a0/images/9ddffb02-b804-4f28-a8fb-df609eaa884a/c7637ade-9c78-4bd7-a9e4-a14913f9060b (d83a3f9a-7625-4872-b61f-0e4b63922a75) [No such file or directory]
[2015-07-15 18:12:22.952290] W [fuse-bridge.c:2273:fuse_writev_cbk] 0-glusterfs-fuse: 293304: WRITE => -1 (Input/output error)
[2015-07-15 18:12:22.952550] W [fuse-bridge.c:2273:fuse_writev_cbk] 0-glusterfs-fuse: 293306: WRITE => -1 (Input/output error)


--- Additional comment from Ravishankar N on 2015-07-16 07:33:59 EDT ---

Able to reproduce the issue with running  a continuous  `dd ` into a file from fuse mount on a 2x2 volume and reducing it to a 1x2, making sure to remove the replica pair in which file resides. dd terminated with EIO. 

[root@vm2 fuse_mnt]# dd if=/dev/urandom of=file
dd: writing to ‘file’: Input/output error
dd: closing output file ‘file’: Input/output error
[root@vm2 fuse_mnt]# 


The EIO is returned by afr_transaction() which is not able to find a readable subvolume for the inode. I need to debug further to see why.

FWIW, there was no data corruption/loss and the migration completed successfully. New reads/writes to the file was successful.

[root@vm2 fuse_mnt]# echo append>>file
[root@vm2 fuse_mnt]# echo $?
0
[root@vm2 fuse_mnt]# tail -1 file
��_�d�!��aappend
[root@vm2 fuse_mnt]# 
[root@vm2 fuse_mnt]# echo $?
0

-

--- Additional comment from Anand Avati on 2015-07-17 06:59:43 EDT ---

REVIEW: http://review.gluster.org/11713 (dht: send lookup even for fd based operations during rebalance) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Anand Avati on 2015-07-17 13:04:54 EDT ---

REVIEW: http://review.gluster.org/11713 (dht: send lookup even for fd based operations during rebalance) posted (#2) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Anand Avati on 2015-07-19 05:24:44 EDT ---

REVIEW: http://review.gluster.org/11713 (dht: send lookup even for fd based operations during rebalance) posted (#3) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Anand Avati on 2015-07-23 02:45:22 EDT ---

COMMIT: http://review.gluster.org/11713 committed in master by Raghavendra G (rgowdapp) 
------
commit 94372373ee355e42dfe1660a50315adb4f019d64
Author: Ravishankar N <ravishankar>
Date:   Fri Jul 17 16:04:01 2015 +0530

    dht: send lookup even for fd based operations during rebalance
    
    Problem:
    dht_rebalance_inprogress_task() was not sending lookups to the
    destination subvolume for a file undergoing writes during rebalance. Due to
    this, afr was not able to populate the read_subvol and failed the write
    with EIO.
    
    Fix:
    Send lookup for fd based operations as well.
    
    Thanks to Raghavendra G for helping with the RCA.
    
    Change-Id: I638c203abfaa45b29aa5902ffd76e692a8212a19
    BUG: 1244165
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/11713
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 1 Anand Avati 2015-07-23 09:31:54 UTC

REVIEW: http://review.gluster.org/11744 (dht: send lookup even for fd based operations during rebalance) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar)

Comment 2 Kaushal 2015-07-30 09:48:19 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.