Bug 1005483 - Dist-geo-rep: 'geo-rep' delete succeeds when there is a node down and when the node comes back up it has stale data
Summary: Dist-geo-rep: 'geo-rep' delete succeeds when there is a node down and when th...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Avra Sengupta
QA Contact: M S Vishwanath Bhat
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-07 13:57 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:56 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.4.0.34rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-27 15:37:15 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1769 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #1 2013-11-27 20:17:39 UTC

Description M S Vishwanath Bhat 2013-09-07 13:57:38 UTC
Description of problem:
geo-rep delete succeeds when there is node/glusterd down in the master cluster. It deletes all the data in every node except for which is down. Now when the node comes back up it will have the stale data and there is no way to clean it up gracefully since there is no delete force command.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.32rhs-1.el6rhs.x86_64

How reproducible:
3/3

Steps to Reproduce:
1. Create and start geo-rep session between two clusters.
2. Now bring a node/glusterd down in master cluster.
3. Now run geo-rep delete to delete the session.

Actual results:
#gluster v geo master falcon::slave status detail

                                       MASTER: master  SLAVE: falcon::slave

NODE                         HEALTH     UPTIME    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING
-------------------------------------------------------------------------------------------------------------------
spitfire.blr.redhat.com      Stopped    N/A       N/A            N/A              N/A              N/A
harrier.blr.redhat.com       Stopped    N/A       N/A            N/A              N/A              N/A
typhoon.blr.redhat.com       Stopped    N/A       N/A            N/A              N/A              N/A
mustang.blr.redhat.com       Stopped    N/A       N/A            N/A              N/A              N/A


# gluster v geo master falcon::slave delete
Deleting geo-replication session between master & falcon::slave has been successful


# gluster v geo master falcon::slave status
No active geo-replication sessions between master and falcon::slave


After the node which was down comes back up...

# gluster v geo master falcon::slave status
NODE                      MASTER    SLAVE            HEALTH     UPTIME
-------------------------------------------------------------------------
harrier.blr.redhat.com    master    falcon::slave    Stopped    N/A



# gluster v geo master falcon::slave delete
Geo-replication session between master and falcon::slave does not exist.
geo-replication command failed




Expected results:
delete should warn or error out with proper message when any node is down in the master cluster.

Additional info:

Unlike geo-rep stop, there is no workaround for this. Since we don't have geo-rep delete force.

Comment 2 Avra Sengupta 2013-09-10 07:34:29 UTC
Fixed with patch https://code.engineering.redhat.com/gerrit/12654

Comment 3 M S Vishwanath Bhat 2013-10-18 06:42:23 UTC
This is fixed now

Tested in Version: glusterfs-3.4.0.35rhs-1.el6rhs.x86_64

When the node is down i get the below output

[root@spitfire ]# gluster v geo master falcon::slave delete
Peer typhoon, which is a part of master volume, is down. Please bring up the peer and retry.
geo-replication command failed

When the node comes back online, I get below output

[root@spitfire ]# gluster v geo master falcon::slave delete
Deleting geo-replication session between master & falcon::slave has been successful

[root@spitfire ]# gluster v geo master falcon::slave status
No active geo-replication sessions between master and falcon::slave


Moving it to Verified.

Comment 5 errata-xmlrpc 2013-11-27 15:37:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html


Note You need to log in before you can comment on or make changes to this bug.