Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1333406 - [HC]: After bringing down and up of the bricks VM's are getting paused
[HC]: After bringing down and up of the bricks VM's are getting paused
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.1
Unspecified Unspecified
high Severity high
: ---
: RHGS 3.2.0
Assigned To: Krutika Dhananjay
SATHEESARAN
:
Depends On:
Blocks: Gluster-HC-2 1351522 1363721 1367270 1367272
  Show dependency treegraph
 
Reported: 2016-05-05 08:32 EDT by RajeshReddy
Modified: 2017-03-23 01:29 EDT (History)
8 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1363721 (view as bug list)
Environment:
Last Closed: 2017-03-23 01:29:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 05:18:45 EDT

  None (edit)
Description RajeshReddy 2016-05-05 08:32:06 EDT
Description of problem:
=====================
After bringing down and up of the bricks, VM's are getting paused

Version-Release number of selected component (if applicable):
=============
glusterfs-server-3.7.9-2.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
=====================
1. Create 1x3 volume and host few VM's on the gluster volumes
2. Login to the VM's and run script to populate data (using DD) 
3. While IO is going on bring down one of the brick and after some time bring up the brick and bring down another brick 
4. After some time Bring up the down brick and bring down another brick during the brick down and bring up process observed few VM's are getting paused 

Actual results:
==================
Virtual machines are getting paused 


Expected results:
=================
VM's should not be paused 

Additional info:
===================
[root@zod ~]# gluster vol info
 
Volume Name: data
Type: Replicate
Volume ID: 5021c1f8-0b2f-4b34-92ea-a087afe84ce3
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: sulphur.lab.eng.blr.redhat.com:/rhgs/data/data-brick1
Brick2: tettnang.lab.eng.blr.redhat.com:/rhgs/data/data-brick2
Brick3: zod.lab.eng.blr.redhat.com:/rhgs/data/data-brick3
Options Reconfigured:
diagnostics.client-log-level: INFO
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
nfs.disable: on
cluster.shd-max-threads: 16
 
Volume Name: engine
Type: Replicate
Volume ID: 5e14889a-0ffc-415f-8fbd-259451972c46
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: sulphur.lab.eng.blr.redhat.com:/rhgs/engine/engine-brick1
Brick2: tettnang.lab.eng.blr.redhat.com:/rhgs/engine/engine-brick2
Brick3: zod.lab.eng.blr.redhat.com:/rhgs/engine/engine-brick3
Options Reconfigured:
cluster.shd-max-threads: 16
nfs.disable: on
cluster.data-self-heal-algorithm: full
performance.low-prio-threads: 32
features.shard-block-size: 512MB
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
 
Volume Name: vmstore
Type: Replicate
Volume ID: edd3e117-138e-437b-9e65-319084fecc4b
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: sulphur.lab.eng.blr.redhat.com:/rhgs/vmstore/vmstore-brick1
Brick2: tettnang.lab.eng.blr.redhat.com:/rhgs/vmstore/vmstore-brick2
Brick3: zod.lab.eng.blr.redhat.com:/rhgs/vmstore/vmstore-brick3
Options Reconfigured:
cluster.shd-max-threads: 16
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
nfs.disable: on
[root@zod ~]#
Comment 2 RajeshReddy 2016-05-05 09:34:48 EDT
sosreports are avilable @rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/bug.1333406
Comment 3 Sahina Bose 2016-05-19 05:42:59 EDT
This bug is related to cyclic network outage test causing file to be in split brain. As this is not a likely scenario, removing from 3.1.3 target
Comment 6 Pranith Kumar K 2016-07-18 06:14:41 EDT
You are correct, we can't prevent VMs getting paused. We only need to make sure that split-brains won't happen. Please note that this case may lead to the VM image going extremely bad, but all we can guarantee is the file not going into split-brain.
Comment 7 Atin Mukherjee 2016-08-09 00:24:57 EDT
Upstream mainline patch http://review.gluster.org/15080 posted for review.
Comment 9 Atin Mukherjee 2016-09-17 10:47:10 EDT
Upstream mainline : http://review.gluster.org/15080
                    http://review.gluster.org/15145

Upstream 3.8 : http://review.gluster.org/15221
               http://review.gluster.org/15164
               

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
Comment 12 SATHEESARAN 2017-01-31 01:59:42 EST
Tested with RHGS 3.2.0 interim build ( glusterfs-3.8.4-12.el7rhgs ) with the following steps:

1. Created replica 3 volume and used it as data domain in RHV
2. When there are continuous I/O happening on the VMs, killed first brick
3. After some time brought up the down brick, and in few mins killed second brick
4. After some time brought up the down brick, and in another few mins killed third brick.
5. After some time brought up the down brick, and in another few mins killed first  brick.

After all this steps, I haven't seen any hiccups with VMs, VMs healthy post reboot, and there are no problems
Comment 14 errata-xmlrpc 2017-03-23 01:29:33 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.