1439753 – Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance

Bug 1439753 - Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance

Summary: Application VMs with their disk images on sharded-replica 3 volume are unable...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0 Async
Assignee:	Krutika Dhananjay
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1434653
Blocks:	Gluster-HC-2
TreeView+	depends on / blocked

Reported:	2017-04-06 13:20 UTC by Rejy M Cyriac
Modified:	2017-06-08 09:34 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.8.4-18.1
Doc Type:	Bug Fix
Doc Text:	Previously, there was a race between layout change on /.shard directory and creation of shards under it as part of parallel ongoing IO operations. This was causing the same shard to exist on multiple subvolumes with different copies of the same shard having witnessed different writes from the application. As a consequence, by virtue of neither shard having complete data, the image was corrupted, making the VM unbootable. With this fix, shard will send LOOKUP on a shard before trying to create it, so that DHT would identify any already existing shard and ensures there would always be one copy of every shard and writes will always be directed to it. Now, the VMs operate correctly when IO and rebalance operations are running in parallel.
Clone Of:	1434653
Clones:	1440051 (view as bug list)
Environment:
Last Closed:	2017-06-08 09:34:33 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:1418	0	normal	SHIPPED_LIVE	glusterfs bug fix update	2017-06-08 13:33:58 UTC

Comment 1 Atin Mukherjee 2017-04-07 08:39:21 UTC

upstream patch : https://review.gluster.org/#/c/17010/

Comment 3 Atin Mukherjee 2017-04-10 07:00:56 UTC

(In reply to Atin Mukherjee from comment #1)
> upstream patch : https://review.gluster.org/#/c/17010/

One more patch https://review.gluster.org/#/c/17014 is needed.

Comment 6 SATHEESARAN 2017-04-28 03:42:59 UTC

There are few more patches sent upstream for the fix

https://review.gluster.org/#/c/17085/

All the discussion about this bug and fixes are available as part of RHGS 3.3.0 bug[1]

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1434653

Comment 13 SATHEESARAN 2017-05-19 02:20:31 UTC

Tested with glusterfs-3.8.4-18.1 with the following tests:

1. Tried rebalance operation on the gluster volume, when VMs are getting installed
2. Triggered rebalance operation, while VMs are with active load
3. Rebooting VMs post rebalance
4. Remove brick with data migration, when VMs with active migration

with all the above mentioned tests, VMs are healthy.

Comment 14 Divya 2017-05-29 09:17:19 UTC

Krutika,

Please review and sign-off the edited doc text.

Comment 15 Krutika Dhananjay 2017-05-29 09:25:25 UTC

Looks good, Divya!

Comment 17 errata-xmlrpc 2017-06-08 09:34:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1418

Note You need to log in before you can comment on or make changes to this bug.