Bug 1408426
Summary: | with granular-entry-self-heal enabled i see that there is a gfid mismatch and vm goes to paused state after migrating to another host | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RamaKasturi <knarra> | |
Component: | arbiter | Assignee: | Krutika Dhananjay <kdhananj> | |
Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.2 | CC: | amukherj, nchilaka, pkarampu, rcyriac, rhinduja, rhs-bugs, sasundar, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.2.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-11 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1408712 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 05:59:24 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1277939, 1351528, 1400057, 1408712, 1408785, 1408786 |
Description
RamaKasturi
2016-12-23 10:56:06 UTC
As suggested by pranith i disabled granluar entry self heal on the volume and i do not see the issue gluster volume info: ============================== [root@rhsqa-grafton1 ~]# gluster volume info engine Volume Name: engine Type: Replicate Volume ID: f0ae3c3a-44ca-4a5e-aafa-b32be8330c11 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.79:/rhgs/brick1/engine Brick2: 10.70.36.80:/rhgs/brick1/engine Brick3: 10.70.36.81:/rhgs/brick1/engine (arbiter) Options Reconfigured: auth.ssl-allow: 10.70.36.80,10.70.36.79,10.70.36.81 server.ssl: on client.ssl: on cluster.use-compound-fops: on cluster.granular-entry-heal: on performance.strict-o-direct: on user.cifs: off network.ping-timeout: 30 cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 cluster.locking-scheme: granular performance.low-prio-threads: 32 features.shard-block-size: 4MB storage.owner-gid: 36 storage.owner-uid: 36 cluster.data-self-heal-algorithm: full features.shard: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on sosreports can be found in the link below: ============================================== http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1408426/ Note: The issue is not specific to arbiter per se. Assigning the bug to Krutika who is working with Sas on the same issue in granular esh. Not changing the component to replicate though, since Kasturi tested it on arbiter configuration.c marking blocker? as VM pause means data unavailability With the latest update from Pranith & Krutika, the issue is caused because of explanation as in https://bugzilla.redhat.com/show_bug.cgi?id=1400057#c11 Though both the issues ( BZ 1400057 & this bug ) will be solved with the patch, both the scenarios needs to be re-tested with the patch in place. This bug needs to be acked as per process for RHGS 3.2.0 Resuming from https://bugzilla.redhat.com/show_bug.cgi?id=1400057#c11 to explain why there would be a gfid mismatch. So please go through https://bugzilla.redhat.com/show_bug.cgi?id=1400057#c11 first. ... the pending xattrs on .shard are at this point erased. Now when the brick that was down comes back online, another MKNOD on this shard's name triggered by shard readv fop, whenever it happens, would cause the fop to give EEXIST from the bricks that were already online; and on the brick that was previously offline, the creation of this shard would succeed, although with a new gfid. This leads to the gfid mismatch. verified and works fine with build glusterfs-3.8.4-11.el7rhgs.x86_64. Followed steps below to verify the bug: ======================================== 1. Install HC with three nodes. 2. Create a arbiter volume and enable all the options using gdeploy. 3. Now bring down the first brick in the arbiter volume and create vm. 4. Once the vm creation is completed, bring back the brick and wait for self heal to happen. 5. Now migrate the vm to another host. I see that vm has been migrated successfully and do not see vm pause once migration is completed. Did not observe any gfid mismatch in the client logs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |