Description of problem: geo-rep delete succeeds when there is node/glusterd down in the master cluster. It deletes all the data in every node except for which is down. Now when the node comes back up it will have the stale data and there is no way to clean it up gracefully since there is no delete force command. Version-Release number of selected component (if applicable): glusterfs-3.4.0.32rhs-1.el6rhs.x86_64 How reproducible: 3/3 Steps to Reproduce: 1. Create and start geo-rep session between two clusters. 2. Now bring a node/glusterd down in master cluster. 3. Now run geo-rep delete to delete the session. Actual results: #gluster v geo master falcon::slave status detail MASTER: master SLAVE: falcon::slave NODE HEALTH UPTIME FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING ------------------------------------------------------------------------------------------------------------------- spitfire.blr.redhat.com Stopped N/A N/A N/A N/A N/A harrier.blr.redhat.com Stopped N/A N/A N/A N/A N/A typhoon.blr.redhat.com Stopped N/A N/A N/A N/A N/A mustang.blr.redhat.com Stopped N/A N/A N/A N/A N/A # gluster v geo master falcon::slave delete Deleting geo-replication session between master & falcon::slave has been successful # gluster v geo master falcon::slave status No active geo-replication sessions between master and falcon::slave After the node which was down comes back up... # gluster v geo master falcon::slave status NODE MASTER SLAVE HEALTH UPTIME ------------------------------------------------------------------------- harrier.blr.redhat.com master falcon::slave Stopped N/A # gluster v geo master falcon::slave delete Geo-replication session between master and falcon::slave does not exist. geo-replication command failed Expected results: delete should warn or error out with proper message when any node is down in the master cluster. Additional info: Unlike geo-rep stop, there is no workaround for this. Since we don't have geo-rep delete force.
Fixed with patch https://code.engineering.redhat.com/gerrit/12654
This is fixed now Tested in Version: glusterfs-3.4.0.35rhs-1.el6rhs.x86_64 When the node is down i get the below output [root@spitfire ]# gluster v geo master falcon::slave delete Peer typhoon, which is a part of master volume, is down. Please bring up the peer and retry. geo-replication command failed When the node comes back online, I get below output [root@spitfire ]# gluster v geo master falcon::slave delete Deleting geo-replication session between master & falcon::slave has been successful [root@spitfire ]# gluster v geo master falcon::slave status No active geo-replication sessions between master and falcon::slave Moving it to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html