Bug 1005483

Summary: Dist-geo-rep: 'geo-rep' delete succeeds when there is a node down and when the node comes back up it has stale data
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: M S Vishwanath Bhat <vbhat>
Component: geo-replicationAssignee: Avra Sengupta <asengupt>
Status: CLOSED ERRATA QA Contact: M S Vishwanath Bhat <vbhat>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: aavati, amarts, csaba, mzywusko, rhs-bugs
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.34rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-27 15:37:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description M S Vishwanath Bhat 2013-09-07 13:57:38 UTC
Description of problem:
geo-rep delete succeeds when there is node/glusterd down in the master cluster. It deletes all the data in every node except for which is down. Now when the node comes back up it will have the stale data and there is no way to clean it up gracefully since there is no delete force command.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.32rhs-1.el6rhs.x86_64

How reproducible:
3/3

Steps to Reproduce:
1. Create and start geo-rep session between two clusters.
2. Now bring a node/glusterd down in master cluster.
3. Now run geo-rep delete to delete the session.

Actual results:
#gluster v geo master falcon::slave status detail

                                       MASTER: master  SLAVE: falcon::slave

NODE                         HEALTH     UPTIME    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING
-------------------------------------------------------------------------------------------------------------------
spitfire.blr.redhat.com      Stopped    N/A       N/A            N/A              N/A              N/A
harrier.blr.redhat.com       Stopped    N/A       N/A            N/A              N/A              N/A
typhoon.blr.redhat.com       Stopped    N/A       N/A            N/A              N/A              N/A
mustang.blr.redhat.com       Stopped    N/A       N/A            N/A              N/A              N/A


# gluster v geo master falcon::slave delete
Deleting geo-replication session between master & falcon::slave has been successful


# gluster v geo master falcon::slave status
No active geo-replication sessions between master and falcon::slave


After the node which was down comes back up...

# gluster v geo master falcon::slave status
NODE                      MASTER    SLAVE            HEALTH     UPTIME
-------------------------------------------------------------------------
harrier.blr.redhat.com    master    falcon::slave    Stopped    N/A



# gluster v geo master falcon::slave delete
Geo-replication session between master and falcon::slave does not exist.
geo-replication command failed




Expected results:
delete should warn or error out with proper message when any node is down in the master cluster.

Additional info:

Unlike geo-rep stop, there is no workaround for this. Since we don't have geo-rep delete force.

Comment 2 Avra Sengupta 2013-09-10 07:34:29 UTC
Fixed with patch https://code.engineering.redhat.com/gerrit/12654

Comment 3 M S Vishwanath Bhat 2013-10-18 06:42:23 UTC
This is fixed now

Tested in Version: glusterfs-3.4.0.35rhs-1.el6rhs.x86_64

When the node is down i get the below output

[root@spitfire ]# gluster v geo master falcon::slave delete
Peer typhoon, which is a part of master volume, is down. Please bring up the peer and retry.
geo-replication command failed

When the node comes back online, I get below output

[root@spitfire ]# gluster v geo master falcon::slave delete
Deleting geo-replication session between master & falcon::slave has been successful

[root@spitfire ]# gluster v geo master falcon::slave status
No active geo-replication sessions between master and falcon::slave


Moving it to Verified.

Comment 5 errata-xmlrpc 2013-11-27 15:37:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html