1298068 – GlusterD restart, starting the bricks when server quorum not met

Bug 1298068 - GlusterD restart, starting the bricks when server quorum not met

Summary: GlusterD restart, starting the bricks when server quorum not met

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Atin Mukherjee
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1298439 1299184 1305256
TreeView+	depends on / blocked

Reported:	2016-01-13 06:55 UTC by Byreddy
Modified:	2016-09-17 16:46 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.7.9-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1298439 (view as bug list)
Environment:
Last Closed:	2016-06-23 05:02:27 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Byreddy 2016-01-13 06:55:52 UTC

Description of problem:
=======================
Had 5 node cluster (n1, n2, n3, n4 & n5 ) with one distributed volume with server quorum enabled  and stopped glusterd in 3 nodes (n3,n4 and n5) and checked the volume status in n1 node, the bricks were offline and restarted the glusterd on that node (n1) and checked the volume status again, this time it bricks are in online.

Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.5-15


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Have 5 node cluster with one distributed volume
2. Enable the server quorum
3. Bring down 3 nodes ( Eg , n3, n4 and n5)
4. Check the volume status in node-1 (n1) // bricks will be in offline state
5. Restart glusterd on node-1
6. Check the volume status // bricks will be in online state

Actual results:
===============
bricks are in online when server quorum not met


Expected results:
=================
Bricks should be in offline state when server quorum not met


Additional info:

Comment 3 SATHEESARAN 2016-01-13 08:41:53 UTC

I have tested the same and found out the possible hint :

while restarting glusterd, glusterd finds out that the server quorum is not met and kills the brick. This is evident from the logs and glusterfsd PID

<snip>

[2016-01-13 13:53:58.238048] C [MSGID: 106002] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume distvol. Stopping local bricks.
[2016-01-13 13:53:58.238707] D [MSGID: 0] [glusterd-utils.c:5611:glusterd_brick_stop] 0-management: About to stop glusterfs for brick dhcp37-152.lab.eng.blr.redhat.com:/rhs/brick1/b1
[2016-01-13 13:53:58.238836] D [MSGID: 0] [glusterd-utils.c:1531:glusterd_service_stop] 0-management: Stopping gluster brick running in pid: 7653  
[2016-01-13 13:53:58.238902] D [MSGID: 0] [glusterd-utils.c:4952:glusterd_set_brick_status] 0-glusterd: Setting brick dhcp37-152.lab.eng.blr.redhat.com:/rhs/brick1/b1 status to stopped
[2016-01-13 13:53:58.239078] D [MSGID: 0] [glusterd-utils.c:5622:glusterd_brick_stop] 0-management: returning 0

</snip>

From the above snippet, you can see the pid 7653 is killed

From gluster volume status output, I could see a different pid.

# gluster volume status distvol
Status of volume: distvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp37-152.lab.eng.blr.redhat.com:/rh
s/brick1/b1                                 49153     0          Y       7894 
NFS Server on localhost                     2049      0          Y       7879 
NFS Server on dhcp37-53.lab.eng.blr.redhat.
com                                         2049      0          Y       14428
 
Task Status of Volume distvol
------------------------------------------------------------------------------
There are no active volume tasks

This means, somebody or somehow, brick has started after glusterd killing it

Comment 4 Atin Mukherjee 2016-01-14 05:55:49 UTC

An upstream patch is posted http://review.gluster.org/13236

Comment 6 Atin Mukherjee 2016-03-22 12:06:52 UTC

The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified.

Comment 8 Byreddy 2016-04-04 05:51:38 UTC

Verified this bug using the build "glusterfs-3.7.9-1".


Repeated the reproducing steps mentioned in description section, Fix is working properly, bricks are not starting after glusterd restart when server quorum not met.



Moving to verified state based on above info.

Comment 10 errata-xmlrpc 2016-06-23 05:02:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.