Bug 1298068
Summary: | GlusterD restart, starting the bricks when server quorum not met | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> | |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | |
Status: | CLOSED ERRATA | QA Contact: | Byreddy <bsrirama> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | asrivast, rhinduja, rhs-bugs, sasundar, storage-qa-internal, vbellur | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.1.3 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.7.9-1 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1298439 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-23 05:02:27 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1298439, 1299184, 1305256 |
Description
Byreddy
2016-01-13 06:55:52 UTC
I have tested the same and found out the possible hint : while restarting glusterd, glusterd finds out that the server quorum is not met and kills the brick. This is evident from the logs and glusterfsd PID <snip> [2016-01-13 13:53:58.238048] C [MSGID: 106002] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume distvol. Stopping local bricks. [2016-01-13 13:53:58.238707] D [MSGID: 0] [glusterd-utils.c:5611:glusterd_brick_stop] 0-management: About to stop glusterfs for brick dhcp37-152.lab.eng.blr.redhat.com:/rhs/brick1/b1 [2016-01-13 13:53:58.238836] D [MSGID: 0] [glusterd-utils.c:1531:glusterd_service_stop] 0-management: Stopping gluster brick running in pid: 7653 [2016-01-13 13:53:58.238902] D [MSGID: 0] [glusterd-utils.c:4952:glusterd_set_brick_status] 0-glusterd: Setting brick dhcp37-152.lab.eng.blr.redhat.com:/rhs/brick1/b1 status to stopped [2016-01-13 13:53:58.239078] D [MSGID: 0] [glusterd-utils.c:5622:glusterd_brick_stop] 0-management: returning 0 </snip> From the above snippet, you can see the pid 7653 is killed From gluster volume status output, I could see a different pid. # gluster volume status distvol Status of volume: distvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp37-152.lab.eng.blr.redhat.com:/rh s/brick1/b1 49153 0 Y 7894 NFS Server on localhost 2049 0 Y 7879 NFS Server on dhcp37-53.lab.eng.blr.redhat. com 2049 0 Y 14428 Task Status of Volume distvol ------------------------------------------------------------------------------ There are no active volume tasks This means, somebody or somehow, brick has started after glusterd killing it An upstream patch is posted http://review.gluster.org/13236 The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified. Verified this bug using the build "glusterfs-3.7.9-1". Repeated the reproducing steps mentioned in description section, Fix is working properly, bricks are not starting after glusterd restart when server quorum not met. Moving to verified state based on above info. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |