Bug 1005483

Summary:	Dist-geo-rep: 'geo-rep' delete succeeds when there is a node down and when the node comes back up it has stale data
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	M S Vishwanath Bhat <vbhat>
Component:	geo-replication	Assignee:	Avra Sengupta <asengupt>
Status:	CLOSED ERRATA	QA Contact:	M S Vishwanath Bhat <vbhat>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.1	CC:	aavati, amarts, csaba, mzywusko, rhs-bugs
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.4.0.34rhs-1	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-11-27 15:37:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description M S Vishwanath Bhat 2013-09-07 13:57:38 UTC

Description of problem:
geo-rep delete succeeds when there is node/glusterd down in the master cluster. It deletes all the data in every node except for which is down. Now when the node comes back up it will have the stale data and there is no way to clean it up gracefully since there is no delete force command.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.32rhs-1.el6rhs.x86_64

How reproducible:
3/3

Steps to Reproduce:
1. Create and start geo-rep session between two clusters.
2. Now bring a node/glusterd down in master cluster.
3. Now run geo-rep delete to delete the session.

Actual results:
#gluster v geo master falcon::slave status detail

MASTER: master SLAVE: falcon::slave

NODE HEALTH UPTIME FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING
-------------------------------------------------------------------------------------------------------------------
spitfire.blr.redhat.com Stopped N/A N/A N/A N/A N/A
harrier.blr.redhat.com Stopped N/A N/A N/A N/A N/A
typhoon.blr.redhat.com Stopped N/A N/A N/A N/A N/A
mustang.blr.redhat.com Stopped N/A N/A N/A N/A N/A

# gluster v geo master falcon::slave delete
Deleting geo-replication session between master & falcon::slave has been successful

# gluster v geo master falcon::slave status
No active geo-replication sessions between master and falcon::slave

After the node which was down comes back up...

# gluster v geo master falcon::slave status
NODE MASTER SLAVE HEALTH UPTIME
-------------------------------------------------------------------------
harrier.blr.redhat.com master falcon::slave Stopped N/A

# gluster v geo master falcon::slave delete
Geo-replication session between master and falcon::slave does not exist.
geo-replication command failed

Expected results:
delete should warn or error out with proper message when any node is down in the master cluster.

Additional info:

Unlike geo-rep stop, there is no workaround for this. Since we don't have geo-rep delete force.

Comment 2 Avra Sengupta 2013-09-10 07:34:29 UTC

Fixed with patch https://code.engineering.redhat.com/gerrit/12654

Comment 3 M S Vishwanath Bhat 2013-10-18 06:42:23 UTC

This is fixed now

Tested in Version: glusterfs-3.4.0.35rhs-1.el6rhs.x86_64

When the node is down i get the below output

[root@spitfire ]# gluster v geo master falcon::slave delete
Peer typhoon, which is a part of master volume, is down. Please bring up the peer and retry.
geo-replication command failed

When the node comes back online, I get below output

[root@spitfire ]# gluster v geo master falcon::slave delete
Deleting geo-replication session between master & falcon::slave has been successful

[root@spitfire ]# gluster v geo master falcon::slave status
No active geo-replication sessions between master and falcon::slave


Moving it to Verified.

Comment 5 errata-xmlrpc 2013-11-27 15:37:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html