Bug 1005478

Summary: Dist-geo-rep: 'geo-rep stop' should fail when there is a node down
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: M S Vishwanath Bhat <vbhat>
Component: geo-replicationAssignee: Avra Sengupta <asengupt>
Status: CLOSED ERRATA QA Contact: M S Vishwanath Bhat <vbhat>
Severity: low Docs Contact:
Priority: medium    
Version: 2.1CC: aavati, amarts, csaba, grajaiya, kparthas, mzywusko, rhs-bugs, shaines, vagarwal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.34rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1006177 (view as bug list) Environment:
Last Closed: 2013-11-27 15:37:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1006177    

Description M S Vishwanath Bhat 2013-09-07 12:18:38 UTC
Description of problem:
Right now geo-rep stop succeeds when there is a node/glusterd down. Running geo-rep stop will stop all the gsync processes in the nodes which are up, but when the node which was down comes back online, the gsync in that node would still be running. geo-rep stop force needs to be run again to stop the process. 

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.32rhs-1.el6rhs.x86_64

How reproducible:
Always


Steps to Reproduce:
1. Create and start a geo-rep session between 2 clusters
2. Now bring down a node (or kill glusterd in that node)
3. Run geo-rep stop on the master node.

Actual results:
[root@spitfire ]# gluster v geo master falcon::slave stop
Stopping geo-replication session between master & falcon::slave has been successful

But when the node which was down comes back online,

[root@spitfire ]# gluster v geo master falcon::slave status
NODE                       MASTER    SLAVE            HEALTH     UPTIME         
----------------------------------------------------------------------------
spitfire.blr.redhat.com    master    falcon::slave    Stopped    N/A            
mustang.blr.redhat.com     master    falcon::slave    Stopped    N/A            
harrier.blr.redhat.com     master    falcon::slave    Stable     01:52:26       
typhoon.blr.redhat.com     master    falcon::slave    Stopped    N/A            



Expected results:
Stop should fail or warn in case when the node is down.

Additional info:

Comment 1 M S Vishwanath Bhat 2013-09-07 12:22:39 UTC
*** Bug 1005477 has been marked as a duplicate of this bug. ***

Comment 3 M S Vishwanath Bhat 2013-09-07 14:00:23 UTC
The work around for this is to run geo-rep stop force when the node comes back online.

Comment 5 Gowrishankar Rajaiyan 2013-10-08 08:42:03 UTC
Fixed in version please.

Comment 6 M S Vishwanath Bhat 2013-10-18 06:31:30 UTC
Fixed now.

Tested in version: glusterfs-3.4.0.35rhs-1.el6rhs.x86_64

When the node is down, 

[root@spitfire ]# gluster v geo master falcon::slave stop
Peer harrier, which is a part of master volume, is down. Please bring up the peer and retry.
geo-replication command failed


And when the peer is back online,

[root@spitfire ]# gluster v geo master falcon::slave stop
Stopping geo-replication session between master & falcon::slave has been successful

[root@spitfire ]# gluster v geo master falcon::slave status
NODE                       MASTER    SLAVE            HEALTH     UPTIME       
--------------------------------------------------------------------------
spitfire.blr.redhat.com    master    falcon::slave    Stopped    N/A          
typhoon.blr.redhat.com     master    falcon::slave    Stopped    N/A          
mustang.blr.redhat.com     master    falcon::slave    Stopped    N/A          
harrier.blr.redhat.com     master    falcon::slave    Stopped    N/A

Moving it to Verified.

Comment 8 errata-xmlrpc 2013-11-27 15:37:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html