Bug 1005483 - Dist-geo-rep: 'geo-rep' delete succeeds when there is a node down and when the node comes back up it has stale data
Dist-geo-rep: 'geo-rep' delete succeeds when there is a node down and when th...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Avra Sengupta
M S Vishwanath Bhat
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-07 09:57 EDT by M S Vishwanath Bhat
Modified: 2016-05-31 21:56 EDT (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.34rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-27 10:37:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description M S Vishwanath Bhat 2013-09-07 09:57:38 EDT
Description of problem:
geo-rep delete succeeds when there is node/glusterd down in the master cluster. It deletes all the data in every node except for which is down. Now when the node comes back up it will have the stale data and there is no way to clean it up gracefully since there is no delete force command.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.32rhs-1.el6rhs.x86_64

How reproducible:
3/3

Steps to Reproduce:
1. Create and start geo-rep session between two clusters.
2. Now bring a node/glusterd down in master cluster.
3. Now run geo-rep delete to delete the session.

Actual results:
#gluster v geo master falcon::slave status detail

                                       MASTER: master  SLAVE: falcon::slave

NODE                         HEALTH     UPTIME    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING
-------------------------------------------------------------------------------------------------------------------
spitfire.blr.redhat.com      Stopped    N/A       N/A            N/A              N/A              N/A
harrier.blr.redhat.com       Stopped    N/A       N/A            N/A              N/A              N/A
typhoon.blr.redhat.com       Stopped    N/A       N/A            N/A              N/A              N/A
mustang.blr.redhat.com       Stopped    N/A       N/A            N/A              N/A              N/A


# gluster v geo master falcon::slave delete
Deleting geo-replication session between master & falcon::slave has been successful


# gluster v geo master falcon::slave status
No active geo-replication sessions between master and falcon::slave


After the node which was down comes back up...

# gluster v geo master falcon::slave status
NODE                      MASTER    SLAVE            HEALTH     UPTIME
-------------------------------------------------------------------------
harrier.blr.redhat.com    master    falcon::slave    Stopped    N/A



# gluster v geo master falcon::slave delete
Geo-replication session between master and falcon::slave does not exist.
geo-replication command failed




Expected results:
delete should warn or error out with proper message when any node is down in the master cluster.

Additional info:

Unlike geo-rep stop, there is no workaround for this. Since we don't have geo-rep delete force.
Comment 2 Avra Sengupta 2013-09-10 03:34:29 EDT
Fixed with patch https://code.engineering.redhat.com/gerrit/12654
Comment 3 M S Vishwanath Bhat 2013-10-18 02:42:23 EDT
This is fixed now

Tested in Version: glusterfs-3.4.0.35rhs-1.el6rhs.x86_64

When the node is down i get the below output

[root@spitfire ]# gluster v geo master falcon::slave delete
Peer typhoon, which is a part of master volume, is down. Please bring up the peer and retry.
geo-replication command failed

When the node comes back online, I get below output

[root@spitfire ]# gluster v geo master falcon::slave delete
Deleting geo-replication session between master & falcon::slave has been successful

[root@spitfire ]# gluster v geo master falcon::slave status
No active geo-replication sessions between master and falcon::slave


Moving it to Verified.
Comment 5 errata-xmlrpc 2013-11-27 10:37:15 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html

Note You need to log in before you can comment on or make changes to this bug.