Bug 1370350
Summary: | Hosted Engine VM paused post replace-brick operation | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | |
Component: | sharding | Assignee: | Krutika Dhananjay <kdhananj> | |
Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | pkarampu, rhinduja, rhs-bugs, sasundar, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.2.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-4 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1392445 (view as bug list) | Environment: |
RHEV-RHGS Hyperconvergence (HCI)
|
|
Last Closed: | 2017-03-23 05:45:56 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1351528, 1392445, 1392844, 1392846, 1392853 |
Description
SATHEESARAN
2016-08-26 02:30:51 UTC
[2016-08-25 08:16:44.373211] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-enginevol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-08-25 08:16:44.373283] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-enginevol-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-08-25 08:16:44.373343] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-enginevol-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-08-25 08:16:44.374685] E [MSGID: 133010] [shard.c:1582:shard_common_lookup_shards_cbk] 2-enginevol-shard: Lookup on shard 3 failed. Base file gfid = 853758b3-f79d-4114-86b4-c9e4fe1f97db [Input/output error] [2016-08-25 08:16:44.374734] W [fuse-bridge.c:2224:fuse_readv_cbk] 0-glusterfs-fuse: 3986689: READ => -1 (Input/output error) [2016-08-25 08:37:22.183269] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-enginevol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-08-25 08:37:22.183355] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-enginevol-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-08-25 08:37:22.183414] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-enginevol-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2016-08-25 08:37:22.184586] E [MSGID: 133010] [shard.c:1582:shard_common_lookup_shards_cbk] 2-enginevol-shard: Lookup on shard 3 failed. Base file gfid = 853758b3-f79d-4114-86b4-c9e4fe1f97db [Input/output error] [2016-08-25 08:37:22.184625] W [fuse-bridge.c:2224:fuse_readv_cbk] 0-glusterfs-fuse: 4023987: READ => -1 (Input/output error) I couldn't hit back for the second time, when Krutika asked for debug enabled logs. Later, Krutika also confirmed that the community user too seeing this problem (In reply to SATHEESARAN from comment #5) > I couldn't hit back for the second time, when Krutika asked for debug > enabled logs. > > Later, Krutika also confirmed that the community user too seeing this problem I stand corrected. I figured later that it was with granular-entry-heal enabled and a case where the same brick was wiped off and healed. The issue there was a combination of dated documentation and the lack of reset-brick functionality. I did see logs of the kind you have pasted in comment #3, of lookups failing with EINVAL but no input/output errors. -Krutika Sas, Do you have the logs from this run, of the bricks, shds and the clients? -Krutika (In reply to Krutika Dhananjay from comment #7) > Sas, > > Do you have the logs from this run, of the bricks, shds and the clients? > > -Krutika Hi Krutika, I have missed the logs. I will try to reproduce this issue. But could you guess any problems with the error messages in comment3 ? Not quite, Sas. The EINVAL seems to be getting propagated by the brick(s) since protocol/client which is the lowest layer in the client stack is receiving EINVAL from over the network. -Krutika Patch posted in upstream for review - http://review.gluster.org/#/c/15788/ Moving this bug to POST state. Tested with RHGS 3.2.0 interim build ( glusterfs-3.8.4-9.el7rhgs ) with the following steps: 1. Created HC setup with self-hosted engine backed with replica 3 sharded volume 2. Added the new host to the cluster and prepared bricks on the host 3. Replaced one of the brick of the replica volume with the new brick with 'replace-brick' utility from RHV UI, which triggers 'gluster volume replace-brick <vol-name> <source-brick> <dest-brick> commit force' Post replacing the brick, self-heal triggered and completed successfully. Engine VM was healthy. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |