Bug 1405302
Summary: | vm does not boot up when first data brick in the arbiter volume is killed. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RamaKasturi <knarra> |
Component: | arbiter | Assignee: | Ravishankar N <ravishankar> |
Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.2 | CC: | amukherj, pkarampu, rcyriac, rhinduja, rhs-bugs, sasundar, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | RHGS 3.2.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-10 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 05:58:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1277939, 1351528 |
Description
RamaKasturi
2016-12-16 07:05:53 UTC
gluster volume info on vmstore: ==================================== [root@rhsqa-grafton4 ~]# gluster volume info vmstore Volume Name: vmstore Type: Replicate Volume ID: 3d67c0ad-5084-4190-a4b5-c468994ca084 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.82:/rhgs/brick2/vmstore Brick2: 10.70.36.83:/rhgs/brick2/vmstore Brick3: 10.70.36.84:/rhgs/brick2/vmstore (arbiter) Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 network.ping-timeout: 30 user.cifs: off performance.strict-o-direct: on client.ssl: on server.ssl: on auth.ssl-allow: 10.70.36.84,10.70.36.82,10.70.36.83 cluster.granular-entry-heal: enable cluster.use-compound-fops: on sosreports can be found in the brick below: ============================================= http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1405302/ From the errors in rhev-data-center-mnt-glusterSD-10.70.36.82\:_vmstore.log in grafton5, it looks like the same problem as BZ 1404982 (comment #5). I am providing a test build to Kasturi with the same fix on top of the latest downstream code (HEAD @ tag: v3.8.4-9, origin/rhgs-3.2.0, rhgs-3.2.0) protocol/client: fix op_errno handling, was unused variable) to see if it fixes the issue. upstream mainline patch http://review.gluster.org/#/c/16205/ posted for review. Hi Ravi, Tested this issue with glusterfs-3.8.4-5.el7rhgs.x86_64. I tried the steps mentioned in the description thrice. But i was not able to hit the issue. Thanks kasturi (In reply to RamaKasturi from comment #8) > Hi Ravi, > > Tested this issue with glusterfs-3.8.4-5.el7rhgs.x86_64. I tried the > steps mentioned in the description thrice. But i was not able to hit the > issue. > > Thanks > kasturi Thanks Kasturi, if we are able to hit the issue with glusterfs-3.8.4-6, then we have a reasonably lesser no. of fixes to do a git bisect and find the offending commit. Please give it a try on v3.8.4-6 as well. Thanks! Ravi Just for the record, after comment#9, Kasturi tried a couple of test builds (thanks a lot Kasturi!) and we were not able to hit the issue with some modifications made to the original patch posted in comment #7. Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/93560/ will verify this bug once the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1400057 lands. verified and works fine with build glusterfs-3.8.4-11.el7rhgs.x86_64. Brought the first brick down in the volume, created a vm and installed os. Once vm is installed,powered off the vm,brought the first brick up and i see that vm has been booted successfully. Moving this to verified state. [root@rhsqa-grafton4 ~]# gluster volume info vmstore Volume Name: vmstore Type: Replicate Volume ID: 2f8938c2-26d3-4912-a6e0-bc12b76146d0 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.82:/rhgs/brick1/vmstore Brick2: 10.70.36.83:/rhgs/brick1/vmstore Brick3: 10.70.36.84:/rhgs/brick1/vmstore (arbiter) Options Reconfigured: auth.ssl-allow: 10.70.36.84,10.70.36.82,10.70.36.83 server.ssl: on client.ssl: on cluster.granular-entry-heal: on user.cifs: off network.ping-timeout: 30 performance.strict-o-direct: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular performance.low-prio-threads: 32 features.shard-block-size: 4MB storage.owner-gid: 36 storage.owner-uid: 36 cluster.data-self-heal-algorithm: full features.shard: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |