Bug 1439753
Summary: | Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rejy M Cyriac <rcyriac> | |
Component: | distribute | Assignee: | Krutika Dhananjay <kdhananj> | |
Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.2 | CC: | amukherj, divya, kdhananj, knarra, rcyriac, rgowdapp, rhinduja, rhs-bugs, sasundar, storage-qa-internal | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.2.0 Async | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-18.1 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, there was a race between layout change on /.shard directory and creation of shards under it as part of parallel ongoing IO operations. This was causing the same shard to exist on multiple subvolumes with different copies of the same shard having witnessed different writes from the application. As a consequence, by virtue of neither shard having complete data, the image was corrupted, making the VM unbootable. With this fix, shard will send LOOKUP on a shard before trying to create it, so that DHT would identify any already existing shard and ensures there would always be one copy of every shard and writes will always be directed to it. Now, the VMs operate correctly when IO and rebalance operations are running in parallel.
|
Story Points: | --- | |
Clone Of: | 1434653 | |||
: | 1440051 (view as bug list) | Environment: | ||
Last Closed: | 2017-06-08 09:34:33 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1434653 | |||
Bug Blocks: | 1277939 |
Comment 1
Atin Mukherjee
2017-04-07 08:39:21 UTC
(In reply to Atin Mukherjee from comment #1) > upstream patch : https://review.gluster.org/#/c/17010/ One more patch https://review.gluster.org/#/c/17014 is needed. There are few more patches sent upstream for the fix https://review.gluster.org/#/c/17085/ All the discussion about this bug and fixes are available as part of RHGS 3.3.0 bug[1] [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1434653 Tested with glusterfs-3.8.4-18.1 with the following tests: 1. Tried rebalance operation on the gluster volume, when VMs are getting installed 2. Triggered rebalance operation, while VMs are with active load 3. Rebooting VMs post rebalance 4. Remove brick with data migration, when VMs with active migration with all the above mentioned tests, VMs are healthy. Krutika, Please review and sign-off the edited doc text. Looks good, Divya! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1418 |