Bug 1439753 - Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance
Summary: Application VMs with their disk images on sharded-replica 3 volume are unable...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
Target Milestone: ---
: RHGS 3.2.0 Async
Assignee: Krutika Dhananjay
Depends On: 1434653
Blocks: Gluster-HC-2
TreeView+ depends on / blocked
Reported: 2017-04-06 13:20 UTC by Rejy M Cyriac
Modified: 2017-06-08 09:34 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.8.4-18.1
Doc Type: Bug Fix
Doc Text:
Previously, there was a race between layout change on /.shard directory and creation of shards under it as part of parallel ongoing IO operations. This was causing the same shard to exist on multiple subvolumes with different copies of the same shard having witnessed different writes from the application. As a consequence, by virtue of neither shard having complete data, the image was corrupted, making the VM unbootable. With this fix, shard will send LOOKUP on a shard before trying to create it, so that DHT would identify any already existing shard and ensures there would always be one copy of every shard and writes will always be directed to it. Now, the VMs operate correctly when IO and rebalance operations are running in parallel.
Clone Of: 1434653
: 1440051 (view as bug list)
Last Closed: 2017-06-08 09:34:33 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1418 normal SHIPPED_LIVE glusterfs bug fix update 2017-06-08 13:33:58 UTC

Comment 1 Atin Mukherjee 2017-04-07 08:39:21 UTC
upstream patch : https://review.gluster.org/#/c/17010/

Comment 3 Atin Mukherjee 2017-04-10 07:00:56 UTC
(In reply to Atin Mukherjee from comment #1)
> upstream patch : https://review.gluster.org/#/c/17010/

One more patch https://review.gluster.org/#/c/17014 is needed.

Comment 6 SATHEESARAN 2017-04-28 03:42:59 UTC
There are few more patches sent upstream for the fix


All the discussion about this bug and fixes are available as part of RHGS 3.3.0 bug[1]

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1434653

Comment 13 SATHEESARAN 2017-05-19 02:20:31 UTC
Tested with glusterfs-3.8.4-18.1 with the following tests:

1. Tried rebalance operation on the gluster volume, when VMs are getting installed
2. Triggered rebalance operation, while VMs are with active load
3. Rebooting VMs post rebalance
4. Remove brick with data migration, when VMs with active migration

with all the above mentioned tests, VMs are healthy.

Comment 14 Divya 2017-05-29 09:17:19 UTC

Please review and sign-off the edited doc text.

Comment 15 Krutika Dhananjay 2017-05-29 09:25:25 UTC
Looks good, Divya!

Comment 17 errata-xmlrpc 2017-06-08 09:34:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.