Bug 1353814

Summary: Bricks are starting when server quorum not met.
Product: [Community] GlusterFS Reporter: Samikshan Bairagya <sbairagy>
Component: glusterdAssignee: Samikshan Bairagya <sbairagy>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.8.0CC: amukherj, bsrirama, bugs, sasundar
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1345727 Environment:
Last Closed: 2016-08-12 09:46:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1340995, 1345727    
Bug Blocks:    

Comment 1 Samikshan Bairagya 2016-07-08 05:12:44 UTC
+++ This bug was initially created as a clone of Bug #1345727 +++

+++ This bug was initially created as a clone of Bug #1340995 +++

Description of problem:
=======================
volume bricks are starting when the server quorum not met. 


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-6.


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have three node cluster (n1, n2 and n3)
2. Create a distribute volume using all three node bricks. (3 brick volume )
3. Enable the server side quotum //gluster volume set <vol_name> cluster.server-quorum-type server
4. stop glusterd in n2 and n3 nodes.
5. Now setup will be in server quorum not met condition //check using volume status
6. Change the cluster.server-quorum-ratio from default to 95
7. Start glusterd on n2 node
8. Check volume status on n1 and n2 nodes  //you will see the bricks part of n2 node online.

Actual results:
===============
Bricks are starting when server quorum not met.

Expected results:
=================
Bricks should not start when server quorum not met.


Additional info: ( info in /var/log/messages)
=================
On node2:
=========
May 31 00:38:01 dhcp43-216 systemd: Starting GlusterFS, a clustered file-system server...
May 31 00:38:04 dhcp43-216 etc-glusterfs-glusterd.vol[20626]: [2016-05-31 04:38:04.330963] C [MSGID: 106003] [glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action] 0-management: Ser
ver quorum regained for volume Dis. Starting local bricks.
May 31 00:38:04 dhcp43-216 systemd: Started GlusterFS, a clustered file-system server.

On Node1:
=========
May 31 00:36:01 dhcp43-215 systemd: Starting Session 6710 of user root.
May 31 00:36:54 dhcp43-215 etc-glusterfs-glusterd.vol[12032]: [2016-05-31 04:36:54.296022] C [MSGID: 106002] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Ser
ver quorum lost for volume Dis. Stopping local bricks.
May 31 00:37:01 dhcp43-215 systemd: Started Session 6711 of user root.
May 31 00:37:01 dhcp43-215 systemd: Starting Session 6711 of user root.


will provide the console logs


--- Additional comment from Atin Mukherjee on 2016-05-31 02:03:44 EDT ---

This is indeed a bug. Since the volume version has been changed while setting the server side quorum when N2 comes up it will go for importing the volume from other nodes. On that code path GlusterD invokes glusterd_start_bricks () which never checks for quorum and resulting into starting the bricks.

--- Additional comment from Atin Mukherjee on 2016-06-15 06:30:48 EDT ---

My analysis is incorrect here. Since cluster.server-quorum-ratio is applicable for all the volumes the volume's version will not get incremented and we will not eventually hit the code path of importing the volume. We'd need to find out a way to take a decision whether to start/stop the brick(s) based on when this global option is synced.

--- Additional comment from Vijay Bellur on 2016-06-17 06:21:21 EDT ---

REVIEW: http://review.gluster.org/14758 (Make sure bricks are not started if server quorum is not met) posted (#1) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Vijay Bellur on 2016-06-17 08:13:27 EDT ---

REVIEW: http://review.gluster.org/14758 (Make sure bricks are not started if server quorum is not met) posted (#2) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Vijay Bellur on 2016-06-17 08:18:37 EDT ---

REVIEW: http://review.gluster.org/14758 (Make sure bricks are not started if server quorum is not met) posted (#3) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Vijay Bellur on 2016-06-22 01:22:17 EDT ---

REVIEW: http://review.gluster.org/14758 (glusterd: Don't start bricks if server quorum is not met) posted (#4) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Vijay Bellur on 2016-06-23 01:51:23 EDT ---

REVIEW: http://review.gluster.org/14758 (glusterd: Don't start bricks if server quorum is not met) posted (#5) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Vijay Bellur on 2016-07-04 05:07:53 EDT ---

REVIEW: http://review.gluster.org/14758 (glusterd: Don't start bricks if server quorum is not met) posted (#6) for review on master by Samikshan Bairagya (samikshan)

--- Additional comment from Vijay Bellur on 2016-07-05 07:57:02 EDT ---

COMMIT: http://review.gluster.org/14758 committed in master by Jeff Darcy (jdarcy) 
------
commit 807b9a135d697f175fc9933f1d23fb67b0cc6c7d
Author: Samikshan Bairagya <samikshan>
Date:   Tue Jun 14 10:52:27 2016 +0530

    glusterd: Don't start bricks if server quorum is not met
    
    Upon glusterd restart if it is observered that the server quorum
    isn't met anymore due to changes to the "server-quorum-ratio"
    global option, the bricks should be stopped if they are running.
    Also if glusterd has been restarted, and if server quorum is not
    applicable for a volume, do not restart the bricks corresponding
    to the volume to make sure that bricks that have been brought
    down purposely, say for maintenance, are not brought up. This
    commit moves this check that was previously inside
    "glusterd_spawn_daemons" to "glusterd_restart_bricks" instead.
    
    Change-Id: I0a44a2e7cad0739ed7d56d2d67ab58058716de6b
    BUG: 1345727
    Signed-off-by: Samikshan Bairagya <samikshan>
    Reviewed-on: http://review.gluster.org/14758
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 2 Vijay Bellur 2016-07-08 05:36:07 UTC
REVIEW: http://review.gluster.org/14876 (glusterd: Don't start bricks if server quorum is not met) posted (#1) for review on release-3.8 by Samikshan Bairagya (samikshan)

Comment 3 Vijay Bellur 2016-07-12 09:06:04 UTC
COMMIT: http://review.gluster.org/14876 committed in release-3.8 by Atin Mukherjee (amukherj) 
------
commit 5cdeeb9d345b24bab4d917724870f3aae89d8369
Author: Samikshan Bairagya <samikshan>
Date:   Fri Jul 8 10:59:13 2016 +0530

    glusterd: Don't start bricks if server quorum is not met
    
    Upon glusterd restart if it is observered that the server quorum
    isn't met anymore due to changes to the "server-quorum-ratio"
    global option, the bricks should be stopped if they are running.
    Also if glusterd has been restarted, and if server quorum is not
    applicable for a volume, do not restart the bricks corresponding
    to the volume to make sure that bricks that have been brought
    down purposely, say for maintenance, are not brought up. This
    commit moves this check that was previously inside
    "glusterd_spawn_daemons" to "glusterd_restart_bricks" instead.
    
    > Change-Id: I0a44a2e7cad0739ed7d56d2d67ab58058716de6b
    > BUG: 1345727
    > Signed-off-by: Samikshan Bairagya <samikshan>
    > Reviewed-on: http://review.gluster.org/14758
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Jeff Darcy <jdarcy>
    
    (cherry picked from commit 807b9a135d697f175fc9933f1d23fb67b0cc6c7d)
    
    Change-Id: I0a44a2e7cad0739ed7d56d2d67ab58058716de6b
    BUG: 1353814
    Signed-off-by: Samikshan Bairagya <samikshan>
    Reviewed-on: http://review.gluster.org/14876
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 4 Niels de Vos 2016-08-12 09:46:59 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.2, please open a new bug report.

glusterfs-3.8.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/announce/2016-August/000058.html
[2] https://www.gluster.org/pipermail/gluster-users/