Bug 1298068 - GlusterD restart, starting the bricks when server quorum not met
GlusterD restart, starting the bricks when server quorum not met
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
3.1
x86_64 Linux
unspecified Severity high
: ---
: RHGS 3.1.3
Assigned To: Atin Mukherjee
Byreddy
: ZStream
Depends On:
Blocks: 1298439 1299184 1305256
  Show dependency treegraph
 
Reported: 2016-01-13 01:55 EST by Byreddy
Modified: 2016-09-17 12:46 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.7.9-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1298439 (view as bug list)
Environment:
Last Closed: 2016-06-23 01:02:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Byreddy 2016-01-13 01:55:52 EST
Description of problem:
=======================
Had 5 node cluster (n1, n2, n3, n4 & n5 ) with one distributed volume with server quorum enabled  and stopped glusterd in 3 nodes (n3,n4 and n5) and checked the volume status in n1 node, the bricks were offline and restarted the glusterd on that node (n1) and checked the volume status again, this time it bricks are in online.

Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.5-15


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Have 5 node cluster with one distributed volume
2. Enable the server quorum
3. Bring down 3 nodes ( Eg , n3, n4 and n5)
4. Check the volume status in node-1 (n1) // bricks will be in offline state
5. Restart glusterd on node-1
6. Check the volume status // bricks will be in online state

Actual results:
===============
bricks are in online when server quorum not met


Expected results:
=================
Bricks should be in offline state when server quorum not met


Additional info:
Comment 3 SATHEESARAN 2016-01-13 03:41:53 EST
I have tested the same and found out the possible hint :

while restarting glusterd, glusterd finds out that the server quorum is not met and kills the brick. This is evident from the logs and glusterfsd PID

<snip>

[2016-01-13 13:53:58.238048] C [MSGID: 106002] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume distvol. Stopping local bricks.
[2016-01-13 13:53:58.238707] D [MSGID: 0] [glusterd-utils.c:5611:glusterd_brick_stop] 0-management: About to stop glusterfs for brick dhcp37-152.lab.eng.blr.redhat.com:/rhs/brick1/b1
[2016-01-13 13:53:58.238836] D [MSGID: 0] [glusterd-utils.c:1531:glusterd_service_stop] 0-management: Stopping gluster brick running in pid: 7653  
[2016-01-13 13:53:58.238902] D [MSGID: 0] [glusterd-utils.c:4952:glusterd_set_brick_status] 0-glusterd: Setting brick dhcp37-152.lab.eng.blr.redhat.com:/rhs/brick1/b1 status to stopped
[2016-01-13 13:53:58.239078] D [MSGID: 0] [glusterd-utils.c:5622:glusterd_brick_stop] 0-management: returning 0

</snip>

From the above snippet, you can see the pid 7653 is killed

From gluster volume status output, I could see a different pid.

# gluster volume status distvol
Status of volume: distvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp37-152.lab.eng.blr.redhat.com:/rh
s/brick1/b1                                 49153     0          Y       7894 
NFS Server on localhost                     2049      0          Y       7879 
NFS Server on dhcp37-53.lab.eng.blr.redhat.
com                                         2049      0          Y       14428
 
Task Status of Volume distvol
------------------------------------------------------------------------------
There are no active volume tasks

This means, somebody or somehow, brick has started after glusterd killing it
Comment 4 Atin Mukherjee 2016-01-14 00:55:49 EST
An upstream patch is posted http://review.gluster.org/13236
Comment 6 Atin Mukherjee 2016-03-22 08:06:52 EDT
The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified.
Comment 8 Byreddy 2016-04-04 01:51:38 EDT
Verified this bug using the build "glusterfs-3.7.9-1".


Repeated the reproducing steps mentioned in description section, Fix is working properly, bricks are not starting after glusterd restart when server quorum not met.



Moving to verified state based on above info.
Comment 10 errata-xmlrpc 2016-06-23 01:02:27 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.