Bug 1298068

Summary: GlusterD restart, starting the bricks when server quorum not met
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Byreddy <bsrirama>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED ERRATA QA Contact: Byreddy <bsrirama>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: asrivast, rhinduja, rhs-bugs, sasundar, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.9-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1298439 (view as bug list) Environment:
Last Closed: 2016-06-23 05:02:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1298439, 1299184, 1305256    

Description Byreddy 2016-01-13 06:55:52 UTC
Description of problem:
=======================
Had 5 node cluster (n1, n2, n3, n4 & n5 ) with one distributed volume with server quorum enabled  and stopped glusterd in 3 nodes (n3,n4 and n5) and checked the volume status in n1 node, the bricks were offline and restarted the glusterd on that node (n1) and checked the volume status again, this time it bricks are in online.

Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.5-15


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Have 5 node cluster with one distributed volume
2. Enable the server quorum
3. Bring down 3 nodes ( Eg , n3, n4 and n5)
4. Check the volume status in node-1 (n1) // bricks will be in offline state
5. Restart glusterd on node-1
6. Check the volume status // bricks will be in online state

Actual results:
===============
bricks are in online when server quorum not met


Expected results:
=================
Bricks should be in offline state when server quorum not met


Additional info:

Comment 3 SATHEESARAN 2016-01-13 08:41:53 UTC
I have tested the same and found out the possible hint :

while restarting glusterd, glusterd finds out that the server quorum is not met and kills the brick. This is evident from the logs and glusterfsd PID

<snip>

[2016-01-13 13:53:58.238048] C [MSGID: 106002] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume distvol. Stopping local bricks.
[2016-01-13 13:53:58.238707] D [MSGID: 0] [glusterd-utils.c:5611:glusterd_brick_stop] 0-management: About to stop glusterfs for brick dhcp37-152.lab.eng.blr.redhat.com:/rhs/brick1/b1
[2016-01-13 13:53:58.238836] D [MSGID: 0] [glusterd-utils.c:1531:glusterd_service_stop] 0-management: Stopping gluster brick running in pid: 7653  
[2016-01-13 13:53:58.238902] D [MSGID: 0] [glusterd-utils.c:4952:glusterd_set_brick_status] 0-glusterd: Setting brick dhcp37-152.lab.eng.blr.redhat.com:/rhs/brick1/b1 status to stopped
[2016-01-13 13:53:58.239078] D [MSGID: 0] [glusterd-utils.c:5622:glusterd_brick_stop] 0-management: returning 0

</snip>

From the above snippet, you can see the pid 7653 is killed

From gluster volume status output, I could see a different pid.

# gluster volume status distvol
Status of volume: distvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp37-152.lab.eng.blr.redhat.com:/rh
s/brick1/b1                                 49153     0          Y       7894 
NFS Server on localhost                     2049      0          Y       7879 
NFS Server on dhcp37-53.lab.eng.blr.redhat.
com                                         2049      0          Y       14428
 
Task Status of Volume distvol
------------------------------------------------------------------------------
There are no active volume tasks

This means, somebody or somehow, brick has started after glusterd killing it

Comment 4 Atin Mukherjee 2016-01-14 05:55:49 UTC
An upstream patch is posted http://review.gluster.org/13236

Comment 6 Atin Mukherjee 2016-03-22 12:06:52 UTC
The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified.

Comment 8 Byreddy 2016-04-04 05:51:38 UTC
Verified this bug using the build "glusterfs-3.7.9-1".


Repeated the reproducing steps mentioned in description section, Fix is working properly, bricks are not starting after glusterd restart when server quorum not met.



Moving to verified state based on above info.

Comment 10 errata-xmlrpc 2016-06-23 05:02:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240