+++ This bug was initially created as a clone of Bug #1363721 +++ +++ This bug was initially created as a clone of Bug #1333406 +++ Description of problem: ===================== After bringing down and up of the bricks, VM's are getting paused Version-Release number of selected component (if applicable): ============= glusterfs-server-3.7.9-2.el7rhgs.x86_64 How reproducible: Steps to Reproduce: ===================== 1. Create 1x3 volume and host few VM's on the gluster volumes 2. Login to the VM's and run script to populate data (using DD) 3. While IO is going on bring down one of the brick and after some time bring up the brick and bring down another brick 4. After some time Bring up the down brick and bring down another brick during the brick down and bring up process observed few VM's are getting paused Actual results: ================== Virtual machines are getting paused Expected results: ================= VM's should not be paused Additional info: =================== [root@zod ~]# gluster vol info Volume Name: data Type: Replicate Volume ID: 5021c1f8-0b2f-4b34-92ea-a087afe84ce3 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: server1:/rhgs/data/data-brick1 Brick2: server2:/rhgs/data/data-brick2 Brick3: server3:/rhgs/data/data-brick3 Options Reconfigured: diagnostics.client-log-level: INFO performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full nfs.disable: on cluster.shd-max-threads: 16 Volume Name: engine Type: Replicate Volume ID: 5e14889a-0ffc-415f-8fbd-259451972c46 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: server1:/rhgs/engine/engine-brick1 Brick2: server2:/rhgs/engine/engine-brick2 Brick3: server3:/rhgs/engine/engine-brick3 Options Reconfigured: cluster.shd-max-threads: 16 nfs.disable: on cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on Volume Name: vmstore Type: Replicate Volume ID: edd3e117-138e-437b-9e65-319084fecc4b Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: server1:/rhgs/vmstore/vmstore-brick1 Brick2: server2:/rhgs/vmstore/vmstore-brick2 Brick3: server3:/rhgs/vmstore/vmstore-brick3 Options Reconfigured: cluster.shd-max-threads: 16 performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full nfs.disable: on [root@zod ~]# --- Additional comment from Sahina Bose on 2016-05-19 05:42:59 EDT --- This bug is related to cyclic network outage test causing file to be in split brain and is not a very likely scenario. --- Additional comment from Krutika Dhananjay on 2016-07-18 01:39:27 EDT --- (In reply to RajeshReddy from comment #0) > Description of problem: > ===================== > After bringing down and up of the bricks, VM's are getting paused > > Version-Release number of selected component (if applicable): > ============= > glusterfs-server-3.7.9-2.el7rhgs.x86_64 > > How reproducible: > > > Steps to Reproduce: > ===================== > 1. Create 1x3 volume and host few VM's on the gluster volumes > 2. Login to the VM's and run script to populate data (using DD) > 3. While IO is going on bring down one of the brick and after some time > bring up the brick and bring down another brick > 4. After some time Bring up the down brick and bring down another brick > during the brick down and bring up process observed few VM's are getting > paused > > Actual results: > ================== > Virtual machines are getting paused > > > Expected results: > ================= > VM's should not be paused Just wondering whether it is possible at all to keep the VM from pausing in this scenario. The best we can do is to prevent the shard/vm image from going into a split-brain when bricks are brought offline and back online in cyclic order, which means the VM(s) will _still_ pause (with EROFS?) at some point, only this time after the particular file/shard is healed, IO may be resumed from inside the VM without requiring manual intervention to fix the split-brain. @Pranith: Are the above statements correct? Or is there a way to actually keep the VM from pausing? -Krutika --- Additional comment from Pranith Kumar K on 2016-07-18 06:14:41 EDT --- You are correct, we can't prevent VMs getting paused. We only need to make sure that split-brains won't happen. Please note that this case may lead to the VM image going extremely bad, but all we can guarantee is the file not going into split-brain. --- Additional comment from Vijay Bellur on 2016-08-03 09:06:18 EDT --- REVIEW: http://review.gluster.org/15080 (cluster/afr: Prevent split-brain when bricks are brought on and off in cyclic order) posted (#1) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-08-03 09:07:12 EDT --- REVIEW: http://review.gluster.org/15080 (cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order) posted (#2) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-08-04 07:46:41 EDT --- REVIEW: http://review.gluster.org/15080 (cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order) posted (#3) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-08-04 22:33:30 EDT --- REVIEW: http://review.gluster.org/15080 (cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order) posted (#4) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-08-09 03:30:40 EDT --- REVIEW: http://review.gluster.org/15080 (cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order) posted (#5) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-08-09 04:22:24 EDT --- REVIEW: http://review.gluster.org/15080 (cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order) posted (#6) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-08-11 05:42:06 EDT --- REVIEW: http://review.gluster.org/15145 (cluster/afr: Bug fixes in txn codepath) posted (#1) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-08-11 21:23:15 EDT --- REVIEW: http://review.gluster.org/15145 (cluster/afr: Bug fixes in txn codepath) posted (#2) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-08-15 06:40:58 EDT --- COMMIT: http://review.gluster.org/15145 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 79b9ad3dfa146ef29ac99bf87d1c31f5a6fe1fef Author: Krutika Dhananjay <kdhananj> Date: Fri Aug 5 12:18:05 2016 +0530 cluster/afr: Bug fixes in txn codepath AFR sets transaction.pre_op[] array even before actually doing the pre-op on-disk. Therefore, AFR must not only consider the pre_op[] array but also the failed_subvols[] information before setting the pre_op_done[] flag. This patch fixes that. Change-Id: I78ccd39106bd4959441821355a82572659e3affb BUG: 1363721 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/15145 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Ravishankar N <ravishankar> Reviewed-by: Pranith Kumar Karampuri <pkarampu> Reviewed-by: Anuradha Talur <atalur> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org>
REVIEW: http://review.gluster.org/15164 (cluster/afr: Bug fixes in txn codepath) posted (#1) for review on release-3.8 by Krutika Dhananjay (kdhananj)
COMMIT: http://review.gluster.org/15164 committed in release-3.8 by Pranith Kumar Karampuri (pkarampu) ------ commit 62ad0bf74a97c20fb06161df3b2dd89f100bf617 Author: Krutika Dhananjay <kdhananj> Date: Fri Aug 5 12:18:05 2016 +0530 cluster/afr: Bug fixes in txn codepath Backport of: http://review.gluster.org/15145 AFR sets transaction.pre_op[] array even before actually doing the pre-op on-disk. Therefore, AFR must not only consider the pre_op[] array but also the failed_subvols[] information before setting the pre_op_done[] flag. This patch fixes that. Change-Id: I726b2acd4025e2e75a87dea547ca6e088bc82c00 BUG: 1367272 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/15164 Reviewed-by: Ravishankar N <ravishankar> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Anuradha Talur <atalur> CentOS-regression: Gluster Build System <jenkins.org>
There is one more patch that needs to go in, before this can be moved to MODIFIED. Changing the state to POST.
REVIEW: http://review.gluster.org/15221 (cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order) posted (#1) for review on release-3.8 by Krutika Dhananjay (kdhananj)
REVIEW: http://review.gluster.org/15221 (cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order) posted (#2) for review on release-3.8 by Oleksandr Natalenko (oleksandr)
REVIEW: http://review.gluster.org/15221 (cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order) posted (#3) for review on release-3.8 by Krutika Dhananjay (kdhananj)
COMMIT: http://review.gluster.org/15221 committed in release-3.8 by Pranith Kumar Karampuri (pkarampu) ------ commit d99f72842595306e9f26a275804bf0f310caba53 Author: Krutika Dhananjay <kdhananj> Date: Thu Jul 28 21:29:59 2016 +0530 cluster/afr: Prevent split-brain when bricks are brought off and on in cyclic order Backport of: http://review.gluster.org/15080 When the bricks are brought offline and then online in cyclic order while writes are in progress on a file, thanks to inode refresh in write txns, AFR will mostly fail the write attempt when the only good copy is offline. However, there is still a remote possibility that the file will run into split-brain if the brick that has the lone good copy goes offline *after* the inode refresh but *before* the write txn completes (I call it in-flight split-brain in the patch for ease of reference), requiring intervention from admin to resolve the split-brain before the IO can resume normally on the file. To get around this, the patch does the following things: i) retains the dirty xattrs on the file ii) avoids marking the last of the good copies as bad (or accused) in case it is the one to go down during the course of a write. iii) fails that particular write with the appropriate errno. This way, we still have one good copy left despite the split-brain situation which when it is back online, will be chosen as source to do the heal. > Change-Id: I9ca634b026ac830b172bac076437cc3bf1ae7d8a > BUG: 1363721 > Signed-off-by: Krutika Dhananjay <kdhananj> > Reviewed-on: http://review.gluster.org/15080 > Tested-by: Pranith Kumar Karampuri <pkarampu> > Smoke: Gluster Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Ravishankar N <ravishankar> > Reviewed-by: Oleksandr Natalenko <oleksandr> > NetBSD-regression: NetBSD Build System <jenkins.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu> (cherry picked from commit fcb5b70b1099d0379b40c81f35750df8bb9545a5) Change-Id: I157f1025aebd6624fa3d412abc69a4ae6f2fe9e0 BUG: 1367272 Signed-off-by: Krutika Dhananjay <kdhananj> Signed-off-by: Oleksandr Natalenko <oleksandr> Reviewed-on: http://review.gluster.org/15221 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.3, please open a new bug report. glusterfs-3.8.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/announce/2016-August/000059.html [2] https://www.gluster.org/pipermail/gluster-users/