1340995 – Bricks are starting when server quorum not met.

Bug 1340995 - Bricks are starting when server quorum not met.

Summary: Bricks are starting when server quorum not met.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Samikshan Bairagya
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1345727 1351522 1351530 1353814
TreeView+	depends on / blocked

Reported:	2016-05-31 05:00 UTC by Byreddy
Modified:	2017-03-23 05:33 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	Bug Fix
Doc Text:	Previously, when glusterd was restarted, bricks were started even when server quorum was not met. This update ensures that bricks are stopped if server quorum is no longer met, or if server quorum is disabled, to ensure that bricks in maintenance are not started incorrectly.
Clone Of:
Clones:	1345727 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:33:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Byreddy 2016-05-31 05:00:44 UTC

Description of problem:
=======================
volume bricks are starting when the server quorum not met. 


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-6.


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have three node cluster (n1, n2 and n3)
2. Create a distribute volume using all three node bricks. (3 brick volume )
3. Enable the server side quotum //gluster volume set <vol_name> cluster.server-quorum-type server
4. stop glusterd in n2 and n3 nodes.
5. Now setup will be in server quorum not met condition //check using volume status
6. Change the cluster.server-quorum-ratio from default to 95
7. Start glusterd on n2 node
8. Check volume status on n1 and n2 nodes  //you will see the bricks part of n2 node online.

Actual results:
===============
Bricks are starting when server quorum not met.

Expected results:
=================
Bricks should not start when server quorum not met.


Additional info: ( info in /var/log/messages)
=================
On node2:
=========
May 31 00:38:01 dhcp43-216 systemd: Starting GlusterFS, a clustered file-system server...
May 31 00:38:04 dhcp43-216 etc-glusterfs-glusterd.vol[20626]: [2016-05-31 04:38:04.330963] C [MSGID: 106003] [glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action] 0-management: Ser
ver quorum regained for volume Dis. Starting local bricks.
May 31 00:38:04 dhcp43-216 systemd: Started GlusterFS, a clustered file-system server.

On Node1:
=========
May 31 00:36:01 dhcp43-215 systemd: Starting Session 6710 of user root.
May 31 00:36:54 dhcp43-215 etc-glusterfs-glusterd.vol[12032]: [2016-05-31 04:36:54.296022] C [MSGID: 106002] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Ser
ver quorum lost for volume Dis. Stopping local bricks.
May 31 00:37:01 dhcp43-215 systemd: Started Session 6711 of user root.
May 31 00:37:01 dhcp43-215 systemd: Starting Session 6711 of user root.


will provide the console logs

Comment 3 Atin Mukherjee 2016-05-31 06:03:44 UTC

This is indeed a bug. Since the volume version has been changed while setting the server side quorum when N2 comes up it will go for importing the volume from other nodes. On that code path GlusterD invokes glusterd_start_bricks () which never checks for quorum and resulting into starting the bricks.

This doesn't look like a critical issue at this stage. Killing the bricks processes on N2 and restarting GlusterD should be able to ensure that quorum is met and bricks are not started. With this explanation moving this bug to 3.2.0

Comment 4 Atin Mukherjee 2016-06-15 10:31:25 UTC

My analysis is incorrect here. Since cluster.server-quorum-ratio is applicable for all the volumes the volume's version will not get incremented and we will not eventually hit the code path of importing the volume. We'd need to find out a way to take a decision whether to start/stop the brick(s) based on when this global option is synced.

Comment 5 Samikshan Bairagya 2016-06-23 08:06:35 UTC

http://review.gluster.org/#/c/14758 (glusterd: Don't start bricks if server quorum is not met) posted for review

Comment 6 Atin Mukherjee 2016-06-23 08:27:52 UTC

Samikshan,

Any upstream patch posted for review makes the respective downstream bug moved to POST state. This bug will not be moved to MODIFIED till the same fix is available in the downstream codebase. 

HTH,
Atin

Comment 7 Samikshan Bairagya 2016-06-23 08:31:28 UTC

(In reply to Atin Mukherjee from comment #6)
> Samikshan,
> 
> Any upstream patch posted for review makes the respective downstream bug
> moved to POST state. This bug will not be moved to MODIFIED till the same
> fix is available in the downstream codebase. 
> 

Yes. I hadn't realized I had selected MODIFIED by mistake. Thanks for changing it to POST.

Comment 9 Atin Mukherjee 2016-09-17 13:52:06 UTC

Upstream mainline:

http://review.gluster.org/14758
http://review.gluster.org/15183

Upstream 3.8:
http://review.gluster.org/14876
http://review.gluster.org/15186

Fixes are available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4

Comment 14 Byreddy 2016-09-27 05:24:44 UTC

Verified this bug using the build - glusterfs-3.8.4-1.

Fix is working well, moving to verified state.

Comment 18 errata-xmlrpc 2017-03-23 05:33:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.