1417042 – glusterd restart is starting the offline shd daemon on other node in the cluster

Bug 1417042 - glusterd restart is starting the offline shd daemon on other node in the cluster

Summary: glusterd restart is starting the offline shd daemon on other node in the cluster

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.10
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1383893
Blocks:	1381825
TreeView+	depends on / blocked

Reported:	2017-01-27 05:05 UTC by Atin Mukherjee
Modified:	2017-03-06 17:44 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.10.0
Clone Of:	1383893
Environment:
Last Closed:	2017-03-06 17:44:28 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Atin Mukherjee 2017-01-27 05:05:16 UTC

+++ This bug was initially created as a clone of Bug #1383893 +++

+++ This bug was initially created as a clone of Bug #1381825 +++

Description of problem:
=======================

glusterd restart on one of the cluster node is restarting the offline selh heal daemon on other cluster node.


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-2


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Have 3 node cluster
2. Create 1*3 volume using both the node bricks and start it.
3. Kill shd daemon using kill -15 on of the cluster node
4. restart glusterd on other cluster node where step-3 is not done.
5. Now check for the volume status on any cluster node, you will see shd running on the node where it was killed in step-3

Actual results:
===============
glusterd restart is starting the offline shd daemon on other node in the cluster 

Expected results:
=================
glusterd restart should not start the offline shd daemon on other node in the cluster.




Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-10-05 02:54:14 EDT ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Atin Mukherjee on 2016-10-12 01:10:22 EDT ---

RCA:

This is not a regression and has been there since server side quorum is introduced. Unlike brick processes, daemon services are (re)started irrespective of what the quorum state is. In this particular case, when glusterd instance on N1 was brought down and shd service of N2 was explicitly killed, upon restarting glusterd service on N1, N2 gets a friend update request which calls glusterd_restart_bricks () and which eventually ends up spawning the shd daemon. If the same reproducer is applied for one of the brick processes, the brick doesn't come up as for bricks the logic is start the brick processes only if the quorum is regained, otherwise skip it. To fix this behaviour the other daemons should also follow the same logic like bricks.

--- Additional comment from Worker Ant on 2016-10-12 03:25:42 EDT ---

REVIEW: http://review.gluster.org/15626 (glusterd: daemon restart logic should adhere server side quorum) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2016-10-13 01:55:51 EDT ---

REVIEW: http://review.gluster.org/15626 (glusterd: daemon restart logic should adhere server side quorum) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-01-27 00:04:33 EST ---

COMMIT: https://review.gluster.org/15626 committed in master by Atin Mukherjee (amukherj) 
------
commit a9f660bc9d2d7c87b3306a35a2088532de000015
Author: Atin Mukherjee <amukherj>
Date:   Wed Oct 5 14:59:51 2016 +0530

    glusterd: daemon restart logic should adhere server side quorum
    
    Just like brick processes, other daemon services should also follow the same
    logic of quorum checks to see if a particular service needs to come up if
    glusterd is restarted or the incoming friend add/update request is received
    (in glusterd_restart_bricks () function)
    
    Change-Id: I54a1fbdaa1571cc45eed627181b81463fead47a3
    BUG: 1383893
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/15626
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Prashanth Pai <ppai>

Comment 1 Worker Ant 2017-01-27 05:06:01 UTC

REVIEW: https://review.gluster.org/16472 (glusterd: daemon restart logic should adhere server side quorum) posted (#1) for review on release-3.10 by Atin Mukherjee (amukherj)

Comment 2 Worker Ant 2017-01-30 14:13:56 UTC

COMMIT: https://review.gluster.org/16472 committed in release-3.10 by Shyamsundar Ranganathan (srangana) 
------
commit 59aba1e739726b1a5e7d771b73c2c88d45113c88
Author: Atin Mukherjee <amukherj>
Date:   Wed Oct 5 14:59:51 2016 +0530

    glusterd: daemon restart logic should adhere server side quorum
    
    Just like brick processes, other daemon services should also follow the same
    logic of quorum checks to see if a particular service needs to come up if
    glusterd is restarted or the incoming friend add/update request is received
    (in glusterd_restart_bricks () function)
    
    >Reviewed-on: https://review.gluster.org/15626
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Smoke: Gluster Build System <jenkins.org>
    >Reviewed-by: Prashanth Pai <ppai>
    
    Change-Id: I54a1fbdaa1571cc45eed627181b81463fead47a3
    BUG: 1417042
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/16472
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>
    Reviewed-by: Samikshan Bairagya <samikshan>
    Reviewed-by: Prashanth Pai <ppai>

Comment 3 Shyamsundar 2017-03-06 17:44:28 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.