|Summary:||Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance|
|Product:||Red Hat Gluster Storage||Reporter:||Rejy M Cyriac <rcyriac>|
|Component:||distribute||Assignee:||Krutika Dhananjay <kdhananj>|
|Status:||CLOSED ERRATA||QA Contact:||SATHEESARAN <sasundar>|
|Version:||rhgs-3.2||CC:||amukherj, divya, kdhananj, knarra, rcyriac, rgowdapp, rhinduja, rhs-bugs, sasundar, storage-qa-internal|
|Target Release:||RHGS 3.2.0 Async|
|Fixed In Version:||glusterfs-3.8.4-18.1||Doc Type:||Bug Fix|
Previously, there was a race between layout change on /.shard directory and creation of shards under it as part of parallel ongoing IO operations. This was causing the same shard to exist on multiple subvolumes with different copies of the same shard having witnessed different writes from the application. As a consequence, by virtue of neither shard having complete data, the image was corrupted, making the VM unbootable. With this fix, shard will send LOOKUP on a shard before trying to create it, so that DHT would identify any already existing shard and ensures there would always be one copy of every shard and writes will always be directed to it. Now, the VMs operate correctly when IO and rebalance operations are running in parallel.
|:||1440051 (view as bug list)||Environment:|
|Last Closed:||2017-06-08 09:34:33 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||1434653|
Comment 1 Atin Mukherjee 2017-04-07 08:39:21 UTC
upstream patch : https://review.gluster.org/#/c/17010/
Comment 3 Atin Mukherjee 2017-04-10 07:00:56 UTC
(In reply to Atin Mukherjee from comment #1) > upstream patch : https://review.gluster.org/#/c/17010/ One more patch https://review.gluster.org/#/c/17014 is needed.
Comment 6 SATHEESARAN 2017-04-28 03:42:59 UTC
There are few more patches sent upstream for the fix https://review.gluster.org/#/c/17085/ All the discussion about this bug and fixes are available as part of RHGS 3.3.0 bug  - https://bugzilla.redhat.com/show_bug.cgi?id=1434653
Comment 13 SATHEESARAN 2017-05-19 02:20:31 UTC
Tested with glusterfs-3.8.4-18.1 with the following tests: 1. Tried rebalance operation on the gluster volume, when VMs are getting installed 2. Triggered rebalance operation, while VMs are with active load 3. Rebooting VMs post rebalance 4. Remove brick with data migration, when VMs with active migration with all the above mentioned tests, VMs are healthy.
Comment 14 Divya 2017-05-29 09:17:19 UTC
Krutika, Please review and sign-off the edited doc text.
Comment 15 Krutika Dhananjay 2017-05-29 09:25:25 UTC
Looks good, Divya!
Comment 17 errata-xmlrpc 2017-06-08 09:34:33 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1418