+++ This bug was initially created as a clone of Bug #1251980 +++ Description of problem: geo-rep status shows Active/Passive for a node even when all the geo-rep related processes in that node are killed. Active/Passive means that the processes are running properly. In reality they are not running. They should be shown faulty/defunct when those processes are not running. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Create and start a geo-rep session between 2*2 dist-rep master node and 2*2 dist-rep slave node. 2. Now kill all the gsync related processes in one of the Active nodes. ps -aef | grep gluster | grep gluster | awk '{print $2}' | xargs kill -9 3. run geo-rep status Actual results: geo-rep status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- pythagoras.blr.redhat.com master /rhs/bricks/brick0 euclid::slave Active N/A Changelog Crawl 2374 0 0 0 0 aryabhatta.blr.redhat.com master /rhs/bricks/brick1 gauss::slave Passive N/A N/A 0 0 0 0 0 ramanujan.blr.redhat.com master /rhs/bricks/brick2 riemann::slave Active N/A Changelog Crawl 1144 0 0 0 0 archimedes.blr.redhat.com master /rhs/bricks/brick3 euler::slave Passive N/A N/A 0 0 0 0 0 But in node 'ramanujan' the gsync processes are not running [root@ramanujan ~]# ps -aef | grep gluster | grep gsync [root@ramanujan ~]# Expected results: When the processes are not running it should show either faulty/defunct. The user should be informed of that the processes are not running. Else user may be under falsehood that geo-rep is syncing data without any issues. Additional info: Easy top reproduce. --- Additional comment from Aravinda VK on 2015-08-04 01:55:39 EDT --- If monitor process is killed. No running process is available to update the status files. Status command will just pick the content from the status file to show the output. Monitor process should not be killed manually for effective working of Geo-rep. If workers are killed, monitor process takes care of updating status files and restarting workers. Killing monitor and showing previous state is expected behavior. To stop Geo-rep, please use Geo-rep stop command. Possible enhancement would be, status command should check monitor pid status before showing the status output. Status should be shown as "Stopped" if respective monitor process is not running.
REVIEW: http://review.gluster.org/12448 (geo-rep: Update geo-rep status, if monitor process is killed) posted (#1) for review on release-3.7 by Saravanakumar Arumugam (sarumuga)
REVIEW: http://review.gluster.org/12448 (geo-rep: Update geo-rep status, if monitor process is killed) posted (#2) for review on release-3.7 by Saravanakumar Arumugam (sarumuga)
REVIEW: http://review.gluster.org/12448 (geo-rep: Update geo-rep status, if monitor process is killed) posted (#3) for review on release-3.7 by Saravanakumar Arumugam (sarumuga)
REVIEW: http://review.gluster.org/12448 (geo-rep: Update geo-rep status, if monitor process is killed) posted (#4) for review on release-3.7 by Saravanakumar Arumugam (sarumuga)
COMMIT: http://review.gluster.org/12448 committed in release-3.7 by Vijay Bellur (vbellur) ------ commit e3b48d847492b831487a8539e3e726706959ac2f Author: Saravanakumar Arumugam <sarumuga> Date: Mon Aug 10 18:42:05 2015 +0530 geo-rep: Update geo-rep status, if monitor process is killed Problem: When the monitor process itself is getting killed, geo-rep session still shows as active. Status command will just pick up the content from the status file to show the output. Monitor process is the one which updates the Status file. When the monitor process itself gets killed, there is no way to update the status file. So, geo-rep session status command ends up showing last updated Status present in the status file. Solution: While getting the status output, check whether monitor process is running. If it is NOT running, update the status as STOPPED. Change-Id: I86a7ac1746dd8f27eef93658e992ef16f6068d9d BUG: 1276060 Signed-off-by: Saravanakumar Arumugam <sarumuga> Reviewed-on: http://review.gluster.org/11873 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Milind Changire <mchangir> Reviewed-by: Kotresh HR <khiremat> Reviewed-by: Jeff Darcy <jdarcy> (cherry picked from commit 4d4c7d5dc54850dcf916083b2b1398d9bfe2bfe6) Reviewed-on: http://review.gluster.org/12448 Reviewed-by: Aravinda VK <avishwan> Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.6, please open a new bug report. glusterfs-3.7.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/gluster-users/2015-November/024359.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user