Bug 871727

Summary:	[RHEV-RHS] Bringing down one storage node in a pure replicate volume (1x2) moved one of the VM to paused state.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	spandura
Component:	replicate	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED ERRATA	QA Contact:	SATHEESARAN <sasundar>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.0	CC:	bbuckley, bfoster, bmohanra, cmosher, grajaiya, jdarcy, nsathyan, pkarampu, ravishankar, rcyriac, rhs-bugs, rwheeler, sasundar, spandura, ssaha, storage-qa-internal, vagarwal, vbellur
Target Milestone:	---
Target Release:	RHGS 3.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.7.0-1.el7rhs	Doc Type:	Bug Fix
Doc Text:	Previously, when self-heal is triggered by shd, it did not update the read-children. Due to this, if the other brick dies then the VMs go into paused state as mount assumes all read-children are down. With this fix, this issue is resolved and it repopulates read-children using getxattr.	Story Points:	---
Clone Of:
Clones:	1095112 (view as bug list)		Environment:	virt rhev integration
Last Closed:	2015-07-29 04:27:54 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	957769, 1095112, 1202842

Description spandura 2012-10-31 08:33:51 UTC

Description of problem:
=========================
When one storage node was powered off in a replicate volume (1x2) , one VM became unresponsive. The VM moved to paused state. When tried to Run the VM, the operation failed. 

RHEVM Error Message:
====================
"VM afr_rep_vm3 has paused due to unknown storage error."

Client (hypervisor) log message when tried to run the VM: 
========================================================
"2012-10-31 07:13:25.894853] W [fuse-bridge.c:1948:fuse_readv_cbk] 0-glusterfs-fuse: 4117398: READ => -1 (Transport endpoint is not connected)"

Version-Release number of selected component (if applicable):
============================================================
[10/31/12 - 13:14:02 root@rhs-client7 0ac09f23-dbbf-4b25-891d-b5e26d636d58]# rpm -qa | grep gluster
glusterfs-fuse-3.3.0rhsvirt1-8.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-8.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-8.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-8.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-8.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-geo-replication-3.3.0rhsvirt1-8.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch

[10/31/12 - 13:51:00 root@rhs-client7 0ac09f23-dbbf-4b25-891d-b5e26d636d58]# gluster --version
glusterfs 3.3.0rhsvirt1 built on Oct 28 2012 23:50:59


How reproducible:
================

Steps to Reproduce:
======================
Refer to : https://tcms.engineering.redhat.com/case/205187/?from_plan=7048

Actual results:
==============
The VM moved into the paused state at step 15 when tried to perform operations on the VM. 


Expected results:
================
performing operations on VM should have been successful. 


Additional info:
==================

[10/31/12 - 13:51:11 root@rhs-client7 0ac09f23-dbbf-4b25-891d-b5e26d636d58]# gluster v info replicate
 
Volume Name: replicate
Type: Replicate
Volume ID: 19270a9d-a664-4344-8adb-a4ff1909f7f6
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: rhs-client6.lab.eng.blr.redhat.com:/disk0
Brick2: rhs-client7.lab.eng.blr.redhat.com:/disk0
Options Reconfigured:
diagnostics.client-log-level: DEBUG
performance.quick-read: disable
performance.io-cache: disable
performance.stat-prefetch: disable
performance.read-ahead: disable
cluster.eager-lock: enable
storage.linux-aio: enable


Note: the "client-log-level" was only set only to debug this problem .

Comment 4 Pranith Kumar K 2013-01-08 11:06:21 UTC

Yet to start work on this bug.

Comment 5 Brian Foster 2013-01-21 20:11:04 UTC

How reproducible is this bug in your testing? 100%? 50%? If reproducible enough to discern, does it always occur at the same point in the test case?

This appears similar to a recent report suspected to be related to selinux, but ultimately was not reproducible. I've attempted to reproduce this a couple times on a local rhev/rhs setup without success. I've run through the entire test case sequence a couple times as well as repeated the final recovery step (step 15, boot up a node into recovery while the VMs are being updated) a couple more times independently. One of the latter tests is still running at the moment. I've reproduced some hung task messages, but no paused VMs thus far...

I didn't see anything obvious in the logs that would explain the failure, though there is a lot of data (including expected failure output) so it's very possible I've missed something. I'll take a second look when I have a chance, but in the meantime I'd suggest we try and get access to an environment in this state if at all possible.

Comment 6 Scott Haines 2013-01-31 04:06:59 UTC

Per 31/01 tiger team bug triage meeting, reducing priority because we can resume from the paused state.

Comment 7 spandura 2013-03-11 09:37:13 UTC

Tried to recreate the problem. The test case passed this time with no VM's getting paused.

Comment 10 Vijay Bellur 2013-03-14 07:54:20 UTC

Dropping blocker tag as per program meeting on 03/11.

Comment 11 Scott Haines 2013-09-23 19:46:44 UTC

Targeting for 2.1.z (Big Bend) U1.

Comment 12 Amar Tumballi 2013-11-26 08:46:27 UTC

Can we run a round of test? its been 7months since we last ran this test-case.

Comment 13 Vivek Agarwal 2013-12-13 17:35:28 UTC

per triage 12/13, removing from corbett list

Comment 19 SATHEESARAN 2014-03-19 13:16:13 UTC

To add to this bug, this issue was filed when server-side quorum and client-side quorum are not made as default to virt profile.

From RHSS 2.1 Update2 , we have enabled server-side quorum and client-side quorum in virt profile (i.e) for virt-store volumes

client-side quorum has certain constraints on its design, that the first brick of the replica group should be up. So, in this case, when client-side quorum is enabled, the VMs on that volume will go to paused state, failing fault-tolerance

But the failure of second brick doesn't affect the VMs on that virt-store.

Also tested the behavior without quorums enabled.
When one of the brick/node goes down, the other replica pair was available and App VMs are up and running healthy.

Comment 20 Jin Zhou 2014-04-22 21:15:18 UTC

What is the status of bug fix for this problem?
My customer is has just filed a support ticket #01079904 for the same problem.

Thanks
Jin

Comment 21 Jin Zhou 2014-04-23 00:25:23 UTC

More info:

Customer is using:

  RHEV 3.3.1-0.48.el6ev 
  RHS glusterfs 3.4.0.59rhs

He has a replicated volume on a two-node RHSS cluster:
  gluster-node-0
  gluster-node-1

He is using "GlusterFS" Storage Domain as configured below

  Path:  gluster-node-0.example.com:/TCC-RHEV
  VFS Type:  glusterfs
  Mount options:  backup-volfile-servers=gluster-node-1.example.com

He reported that, VM got paused when he manually took down the "gluster-node-0" node.
But the VM works fine when he manually took down node "gluster-node-1".

In addition, I know when "PosixFS" is used, RHEV uses Gluster FUSE client to mount Gluster volume on RHEV-H, but in this case he is using "GlusetFS" type Storage Domain. Does the "Mount Options" actually do anything in "GlusterFS" Storage Domain (is it using libgfapi?)?

Should customer use "PosixFS" or "GlusetFS" in RHEV?

Comment 22 Pranith Kumar K 2014-04-23 04:32:06 UTC

hi Jin Zhou,
    Did the customer enable client-quorum by any chance? If it is enabled then this behavior is expected. Could you please check gluster volume info output to confirm the same.

Pranith

Comment 23 Jin Zhou 2014-04-23 05:25:23 UTC

Customer is using the default "virt" profile, so I think by default client quorum is set to auto, and server quorum is set to "server".

But the part I don't understand is what caused the difference betwwen failure on gluster-node-0, and gluster-node-1? I would expect the client quorum being enforced regardless of which brick/node goes down. Why VM is only suspended when gluster-node-0 goes offline, not gluster-node-1? 

SATHEESARAN's note above seem to indicate this behavior, but not detailed enough for me.

Lastly, since we only officially support replica=2 today, what is rational for enabling client quorum as "auto", it seems useless to me. But I could be wrong.

Thanks

Comment 24 Pranith Kumar K 2014-04-23 05:40:23 UTC

hi Jin Zhou,
   client-quorum calculation happens the following way:
In general cases, quorum is met when n/2 + 1 bricks of the replica set are available, but if the number of bricks is even and exactly n/2 bricks in replica set are up then quorum is met if the first brick in the set is up.

The reason why client-quorum is enabled by default is that image going into split-brain is much worse than losing availability when the first brick goes down. Without any quorum VMs are accessible when
1) both bricks are up
2) When only first brick goes down in replica set
3) When only second brick goes down in replica set

With client quorum, in cases 1), 3) the VMs are accessible.

Pranith

Comment 31 SATHEESARAN 2015-07-20 10:17:15 UTC

Tested with RHGS 3.1 Nightly build ( glusterfs-3.7.1-11.el7rhgs ) with the following test :

1. Used replica 2 volume to back the RHEV Data domain
2. Powered off one of the node abruptly
and observed that the VMs are still accessible and available.


Marking this bug as VERIFIED

Comment 32 Bhavana 2015-07-25 06:56:00 UTC

Hi Pranith,

The doc text is updated. Please review the same and share your technical review comments. If it looks ok, then sign-off on the same.

Regards,
Bhavana

Comment 34 errata-xmlrpc 2015-07-29 04:27:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Comment 36 Red Hat Bugzilla 2023-09-14 01:38:19 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days