Bug 1559452
Summary: | Volume status inode is broken with brickmux | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rajesh Madaka <rmadaka> | |
Component: | glusterd | Assignee: | hari gowtham <hgowtham> | |
Status: | CLOSED ERRATA | QA Contact: | Bala Konda Reddy M <bmekala> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.3 | CC: | amukherj, moagrawa, rhinduja, rhs-bugs, sheggodu, storage-qa-internal, vbellur, vdas | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.4.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.12.2-13 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1566067 (view as bug list) | Environment: | ||
Last Closed: | 2018-09-04 06:45:40 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1566067, 1569336, 1569346, 1579769 | |||
Bug Blocks: | 1503137 |
Description
Rajesh Madaka
2018-03-22 15:19:17 UTC
Changed to correct version-release number Version-Release number of selected component (if applicable): 3.8.4-54-3 I have followed the steps from description, volume status inode command failing on 3rd volume. changing state to assigned. Below error message i am getting for 3rd volume # gluster vol status dstrbrep1 inode Commit failed on dhcp37-174. Please check log file for details. Build version: glusterfs-3.12.2-8 logs pasted below: ----------------- [2018-05-03 12:44:36.320186] I [MSGID: 106499] [glusterd-handler.c:4383:__glusterd_handle_status_volume] 0-management: Received status volume req for volume dstrbrep1 [2018-05-03 12:44:36.336651] E [MSGID: 106062] [glusterd-utils.c:10024:glusterd_volume_status_copy_to_op_ctx_dict] 0-management: Failed to get other count from rsp_dict [2018-05-03 12:44:36.336723] E [MSGID: 106108] [glusterd-syncop.c:1146:_gd_syncop_commit_op_cbk] 0-management: Failed to aggregate response from node/brick [2018-05-03 12:44:36.336750] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on dhcp37-174. Please check log file for details From the machine I can see that the commit fails only on the remote node and not on the originator. other-count will be set in the commit as commit failed other count wasn't set and it caused the aggregation failure. The commit phase on the remote node is a combination of brickop and commitop as it works on op-sm framework. So the brickop failure can cause a commitop failure. On debugging the set up i saw a glusterd_op_ac_brick_op_failed which shows that the brickop didnt complete successfully. So from the glusterd_brick_op_cbk it was found that the rsp was not set properly. So the brick process was debugged and from there its found that the lock was failing because it was held already. [2018-05-03 12:43:18.903779] W [MSGID: 101014] [client_t.c:857:gf_client_dump_inodes_to_dict] 0-client_t: Unable to acquire lock The issue happens with distributed-replicated volumes, and not with replica volumes.The glusterd_brick_op sends request one after the other for each brick. This further confirms that the earlier request was being processed and before it was done processing the second request for the second brick tries to get the lock and fails as its held by the previous brick request. As the failure is based on the duration of the lock being held, the spurious reproduction of the commit failure is explained. the patch on master is https://review.gluster.org/#/c/20035/ Build: 3.12.2-13 Followed the steps in the description. On a three node cluster created two replica 3 volumes and performed gluster volume status <volum> inode and fd continuously while io is in progress Created a new volume(distributed replica) and started it, On the new volume all the bricks online. Marking it to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |