+++ This bug was initially created as a clone of Bug #1383893 +++ +++ This bug was initially created as a clone of Bug #1381825 +++ Description of problem: ======================= glusterd restart on one of the cluster node is restarting the offline selh heal daemon on other cluster node. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-2 How reproducible: ================= Always Steps to Reproduce: =================== 1. Have 3 node cluster 2. Create 1*3 volume using both the node bricks and start it. 3. Kill shd daemon using kill -15 on of the cluster node 4. restart glusterd on other cluster node where step-3 is not done. 5. Now check for the volume status on any cluster node, you will see shd running on the node where it was killed in step-3 Actual results: =============== glusterd restart is starting the offline shd daemon on other node in the cluster Expected results: ================= glusterd restart should not start the offline shd daemon on other node in the cluster. Additional info: --- Additional comment from Red Hat Bugzilla Rules Engine on 2016-10-05 02:54:14 EDT --- This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Atin Mukherjee on 2016-10-12 01:10:22 EDT --- RCA: This is not a regression and has been there since server side quorum is introduced. Unlike brick processes, daemon services are (re)started irrespective of what the quorum state is. In this particular case, when glusterd instance on N1 was brought down and shd service of N2 was explicitly killed, upon restarting glusterd service on N1, N2 gets a friend update request which calls glusterd_restart_bricks () and which eventually ends up spawning the shd daemon. If the same reproducer is applied for one of the brick processes, the brick doesn't come up as for bricks the logic is start the brick processes only if the quorum is regained, otherwise skip it. To fix this behaviour the other daemons should also follow the same logic like bricks. --- Additional comment from Worker Ant on 2016-10-12 03:25:42 EDT --- REVIEW: http://review.gluster.org/15626 (glusterd: daemon restart logic should adhere server side quorum) posted (#1) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2016-10-13 01:55:51 EDT --- REVIEW: http://review.gluster.org/15626 (glusterd: daemon restart logic should adhere server side quorum) posted (#2) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-01-27 00:04:33 EST --- COMMIT: https://review.gluster.org/15626 committed in master by Atin Mukherjee (amukherj) ------ commit a9f660bc9d2d7c87b3306a35a2088532de000015 Author: Atin Mukherjee <amukherj> Date: Wed Oct 5 14:59:51 2016 +0530 glusterd: daemon restart logic should adhere server side quorum Just like brick processes, other daemon services should also follow the same logic of quorum checks to see if a particular service needs to come up if glusterd is restarted or the incoming friend add/update request is received (in glusterd_restart_bricks () function) Change-Id: I54a1fbdaa1571cc45eed627181b81463fead47a3 BUG: 1383893 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/15626 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Prashanth Pai <ppai>
REVIEW: https://review.gluster.org/16472 (glusterd: daemon restart logic should adhere server side quorum) posted (#1) for review on release-3.10 by Atin Mukherjee (amukherj)
COMMIT: https://review.gluster.org/16472 committed in release-3.10 by Shyamsundar Ranganathan (srangana) ------ commit 59aba1e739726b1a5e7d771b73c2c88d45113c88 Author: Atin Mukherjee <amukherj> Date: Wed Oct 5 14:59:51 2016 +0530 glusterd: daemon restart logic should adhere server side quorum Just like brick processes, other daemon services should also follow the same logic of quorum checks to see if a particular service needs to come up if glusterd is restarted or the incoming friend add/update request is received (in glusterd_restart_bricks () function) >Reviewed-on: https://review.gluster.org/15626 >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Smoke: Gluster Build System <jenkins.org> >Reviewed-by: Prashanth Pai <ppai> Change-Id: I54a1fbdaa1571cc45eed627181b81463fead47a3 BUG: 1417042 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/16472 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana> Reviewed-by: Samikshan Bairagya <samikshan> Reviewed-by: Prashanth Pai <ppai>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/