Bug 1205162

Summary:	[georep]: If a georep session is recreated the existing files which are deleted from slave doesn't get sync again from master
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Rahul Hinduja <rhinduja>
Component:	geo-replication	Assignee:	Kotresh HR <khiremat>
Status:	CLOSED ERRATA	QA Contact:	Rahul Hinduja <rhinduja>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rhgs-3.0	CC:	amukherj, avishwan, chrisw, csaba, khiremat, mchangir, nlevinki, rcyriac, sankarshan, sarumuga
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-15	Doc Type:	Bug Fix
Doc Text:	When a geo-replication session was deleted, the sync time attribute on the root directory of the brick was not reset to zero. This meant that when a new geo-replication session was created, the stale sync time attribute caused the sync process to ignore all files created up until the stale sync time, and start syncing from that time. A new reset-sync-time option has been added to the session delete command so that administrators can reset the sync time attribute is to zero if required.	Story Points:	---
Clone Of:
Clones:	1311926 (view as bug list)		Environment:
Last Closed:	2017-03-23 05:21:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1422760
Bug Blocks:	1311926, 1351522, 1351530, 1357772, 1357773

Description Rahul Hinduja 2015-03-24 11:07:12 UTC

Description of problem:
=======================

If the files are deleted from slave volume after the session is deleted between master and slave volume. These files will never again sync after recreating the session. It is because we maintain the information in master for the files that are already sync.

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.6.0.53-1.el6rhs.x86_64

How reproducible:
=================
1/1

Steps to Reproduce:
==================
1. Create and start a georep session between master and slave volume.
2. Create data to the master volume
3. Let the georep sync the data to the slave volume.
4. Once the data is synced to slave volume, stop and delete the session between master and slave.
5. Delete the files from slave volume
6. Re-create and start the session between master and slave volume.
7. The files that were deleted from slave volume doesn't get sync from master

Comment 2 Aravinda VK 2015-12-08 09:27:51 UTC

As part of geo-rep delete command, we should remove stime xattrs from Master Brick roots. So that on re-creation it will start syncing from beginning.

Comment 3 Aravinda VK 2016-04-25 07:25:45 UTC

Milind, Please consider this scenario while working on stime reset patch. 
https://bugzilla.redhat.com/show_bug.cgi?id=1329675#c2

Comment 4 Milind Changire 2016-06-06 11:34:37 UTC

Patch http://review.gluster.org/14051 has been posted upstream (mainline) for review.

Comment 5 Aravinda VK 2016-06-29 07:19:43 UTC

Added new option for delete command to reset the sync time(reset-sync-time)

Comment 7 Atin Mukherjee 2016-09-17 15:35:55 UTC

Upstream mainline : http://review.gluster.org/14051
Upstream 3.8 : http://review.gluster.org/14953

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 10 Rahul Hinduja 2017-02-14 18:26:36 UTC

Verified with build: glusterfs-geo-replication-3.8.4-13.el7rhgs.x86_64

It worked for the data which was initially synced via changelog but failed for the data which was synced via xsync

Steps Tested:
=============

1. Create Master and Slave cluster/volume
2. Create geo-rep session between master and slave
3. Create some data on master:

crefi -T 10 -n 10 --multi -d 5 -b 5 --random --max=5K --min=1K --f=create /mnt/master/

AND,

mkdir data; cd data ; for i in {1..999}; do dd if=/dev/zero of=dd.$i bs=1M count=1 ; done

4. Let the data be synced to slave. 
5. Stop and delete the geo-rep session using reset-sync-time
6. remove the data created by crefi from slave mount
7. Append the data on master for the file in data
8. Recreate geo-rep session using force
9. Start the geo-rep session

Files do properly get sync to slave and arequal matches.

10. Stop and delete the geo-rep session again using reset-sync-time
11. remove the complete data from slave (rm -rf *)
12. Recreate geo-rep session using force
13. Start the geo-rep session

Only the root directories are synced and no subdirectory/files get sync

Master:
=======

[root@dj ~]# ./scripts/arequal-checksum -p /mnt/master/

Entry counts
Regular files   : 3821
Directories     : 264
Symbolic links  : 0
Other           : 0
Total           : 4085

Metadata checksums
Regular files   : 489009
Directories     : 3e9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 8960ba9adedccfccf73a8f5024a4d980
Directories     : 4a40163964221b39
Symbolic links  : 0
Other           : 0
Total           : 341a23f39e5a0d75
[root@dj ~]# 



Slave:
======
[root@dj ~]# ls -lR /mnt/slave/
/mnt/slave/:
total 44
drwxr-xr-x. 2 root root 4096 Feb 13 22:25 data
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread0
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread1
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread2
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread3
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread4
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread5
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread6
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread7
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread8
drwxr-xr-x. 2 root root 4096 Feb 13 22:19 thread9

/mnt/slave/data:
total 0

/mnt/slave/thread0:
total 0

/mnt/slave/thread1:
total 0

/mnt/slave/thread2:
total 0

/mnt/slave/thread3:
total 0

/mnt/slave/thread4:
total 0

/mnt/slave/thread5:
total 0

/mnt/slave/thread6:
total 0

/mnt/slave/thread7:
total 0

/mnt/slave/thread8:
total 0

/mnt/slave/thread9:
total 0
[root@dj ~]# 


Since it is not syncing. Moving the bug back to assigned state

Comment 12 Atin Mukherjee 2017-02-15 11:41:50 UTC

Upstream patch : https://review.gluster.org/#/c/16629

Comment 13 Kotresh HR 2017-02-16 12:24:25 UTC

Upstream Patch:
https://review.gluster.org/#/c/16629/ (master)
https://review.gluster.org/#/c/16641/ (3.8)
https://review.gluster.org/#/c/16642/ (3.9)
https://review.gluster.org/#/c/16644/ (3.10)

Downstream Patch:
https://code.engineering.redhat.com/gerrit/#/c/97943/

Comment 15 Rahul Hinduja 2017-02-20 16:32:32 UTC

Verified with build: glusterfs-geo-replication-3.8.4-15.el7rhgs.x86_64

Scenario mentioned in comment 10 works, moving this bug to verified state.

Comment 20 errata-xmlrpc 2017-03-23 05:21:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html