Bug 1000462

Summary: Dist-geo-rep : geo-rep xsync crawl fails to sync few symlinks to slave after changelog was disabled.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: M S Vishwanath Bhat <vbhat>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: aavati, bbandari, csaba, grajaiya, khiremat, kparthas, mzywusko, rhs-bugs, sdharane, vagarwal, vbhat, vkoppad, vraman, vshankar
Target Milestone: ---Keywords: Reopened, ZStream
Target Release: RHGS 2.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.33rhs Doc Type: Bug Fix
Doc Text:
Cause: There are actually couple of reasons. One being the symlink handling code in the xsync crawl was incorrectly generating the journal entry in the changelog. The other reason being a perceived slowness. When the hybrid crawl is the main crawl, an FS crawl is done every 60 seconds. Consequence: Symlinks were not getting synced to the slave. Fix: 1. The symlink handling code was fixed. 2. During the main crawl a sync happens every 60 seconds -- this was observed while reproducing the bug. Result: symlinks are synced to the slave.
Story Points: ---
Clone Of:
: 1018228 (view as bug list) Environment:
Last Closed: 2014-02-25 07:35:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1018228    

Description Vijaykumar Koppad 2013-08-23 13:38:34 UTC
Description of problem: The xsync crawl failed to sync few of the symlinks to slave, after the geo-rep change_detector  was fallen back to xsync due to disabling of the changelog. 

The missing files don't have entries in the XSYNC-CHANGELOGs only, which xsync only has failed to get the change. 


Version-Release number of selected component (if applicable):
glusterfs-3.4.0.22rhs-2.el6rhs.x86_64

How reproducible: Didn't try to reproduce it.


Steps to Reproduce:
1.create and start geo-rep relationship between master and slave. 
2.create some files on the master and let it sync to slave. 
3.start creating symlinks to created file and parallely disable the changelog. 
4.The geo-rep change_detector falls back to xsync and wait for it to sync files to slave 

Actual results: change_detector fails to sync few symlinks to slave 


Expected results: It should sync all kind of file to slave 


Additional info:

Comment 4 Gowrishankar Rajaiyan 2013-10-08 08:41:17 UTC
Fixed in version please.

Comment 5 Vivek Agarwal 2013-10-11 14:01:41 UTC
closing this as it was part of the errata and cloning this for U1.

Comment 7 Vivek Agarwal 2013-10-15 10:13:39 UTC
*** Bug 1018228 has been marked as a duplicate of this bug. ***

Comment 9 Vijaykumar Koppad 2013-11-15 09:56:41 UTC
Tried it on the build glusterfs-3.4.0.44rhs. It still fails to sync few symlinks to slave

Steps,

1.create and start geo-rep relationship between master and slave. 
2.create some files on the master and let it sync to slave. 
3.start creating symlinks to created file and parallely disable the changelog. 
4.The geo-rep change_detector falls back to xsync and wait for it to sync files to slave

Comment 10 Venky Shankar 2013-12-03 06:09:25 UTC
With the new implementation of xsync crawl and bug fixes in it, this needs to be retested. Please run the relevant test case and verify.

Comment 11 Vijaykumar Koppad 2013-12-24 07:31:22 UTC
Please provide correct fixed in version. Considering the issue was reproduced in glusterfs-3.4.0.44rhs also , it can't be fixed in glusterfs-3.4.0.33rhs.

Comment 12 Vijaykumar Koppad 2013-12-24 07:58:34 UTC
Tried on the build glusterfs-3.4.0.52rhs,  It still fails to sync few symlinks to slave

Steps,

1.create and start geo-rep relationship between master and slave. 
2.create some files on the master and let it sync to slave. 
3.start creating symlinks to created file and parallely disable the changelog. 
4.The geo-rep change_detector falls back to xsync and wait for it to sync files to slave

Comment 13 Vijaykumar Koppad 2013-12-24 08:08:27 UTC
But later enabling the changelog and changing change-detector to changelog would sync the rest of the files. Not sure if this bug should be considered.

Comment 14 Kotresh HR 2013-12-24 09:28:32 UTC
Vijay,

As far I know, we always use changelog as the change detector by default
and use xsync to crawl initially if there is existing data and do not 
recommend switching off changelog.

If above is true, I don't see a problem in this.
Correct me if am wrong.

-Kotresh H R

Comment 15 Vijaykumar Koppad 2013-12-24 09:33:24 UTC
Kotresh, 

Disabling changelog is to simulate the changelog crash. The whole point of introducing xsync was to have a backup if changelog crashes for some reason.

Comment 17 Venky Shankar 2013-12-26 11:02:52 UTC
(In reply to Vijaykumar Koppad from comment #13)
> But later enabling the changelog and changing change-detector to changelog
> would sync the rest of the files. Not sure if this bug should be considered.

Are you sure the entry got synced in changelog mode? When the hybrid crawl is used as the main crawling mechanism (i.e. change_detector is 'xsync'), a FS crawl is done every 60 seconds.

Comment 18 Venky Shankar 2013-12-27 05:22:14 UTC
(In reply to Vijaykumar Koppad from comment #13)
> But later enabling the changelog and changing change-detector to changelog
> would sync the rest of the files. Not sure if this bug should be considered.

I'm sure the entries (symlinks) did not get synced in changelog mode. When change-logging was disabled, there would be _no_ journal that would capture the symlink creation fop. Enabling change-logging would start journaling for fops thereafter.

Comment 19 Venky Shankar 2013-12-27 05:27:20 UTC
(In reply to Vijaykumar Koppad from comment #15)
> Kotresh, 
> 
> Disabling changelog is to simulate the changelog crash. The whole point of
> introducing xsync was to have a backup if changelog crashes for some reason.

That's incorrect. Being a sync backup mechanism is not the sole job for xsync, it's always needed for the first crawl.

Comment 20 Kotresh HR 2013-12-27 10:01:41 UTC
Vijay,

I tried with 10k files. It is synching properly with latest
rhs-2.1 downstream. For how many files did you hit this issue?

-Kotresh H R

Comment 21 Venky Shankar 2013-12-30 07:06:15 UTC
Vijaykumar,

Could you check comment #17?

Comment 22 M S Vishwanath Bhat 2013-12-30 13:25:21 UTC
I just tried with 52rhs with Venky. The issue is not being reproduced.

Since the change_detector was xsync, syncing takes place for every 60 secs. So one has to wait for 60 secs before seeing the effect.

I will try once more, since vkoppad hit this in 52rhs itself. Will change the status after trying once more. But for now issue is not being reproduced.

Comment 23 M S Vishwanath Bhat 2014-01-02 12:58:09 UTC
I tried once more and all the files including symlinks get synced to slave. But after the sync metadata checksum on regular files in master and slave differs.

Tested in version: glusterfs-3.4.0.53rhs-1.el6rhs.x86_64

arequal-checksums on master and slaves after the sync.



[root@gauss ~]# /opt/qa/tools/arequal-checksum /mnt/master/

Entry counts
Regular files   : 44002
Directories     : 2837
Symbolic links  : 44005
Other           : 0
Total           : 90844

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 5a815a
Other           : 3e9

Checksums
Regular files   : 2f948f83699a8d85afd4b03190c381e6
Directories     : 3b1f6d3f7e6c3b4e
Symbolic links  : 2663f7e20564158
Other           : 0
Total           : b9396df3a7637675


[root@gauss ~]# /opt/qa/tools/arequal-checksum /mnt/slave/

Entry counts
Regular files   : 44002
Directories     : 2837
Symbolic links  : 44005
Other           : 0
Total           : 90844

Metadata checksums
Regular files   : 20f5
Directories     : 24d74c
Symbolic links  : 5a815a
Other           : 3e9

Checksums
Regular files   : 2f948f83699a8d85afd4b03190c381e6
Directories     : 3b1f6d3f7e6c3b4e
Symbolic links  : 2663f7e20564158
Other           : 0
Total           : b9396df3a7637675


There is a different bug for metadata checksum difference between master and slave after sync. Should I verify this bug or should be closed only after that bug is fixed?

Comment 24 Venky Shankar 2014-01-02 13:12:48 UTC
MS,

This looks OK. The metadata checksum mismatch is another bug that you've already reported.

We can move this to verified.

Comment 25 M S Vishwanath Bhat 2014-01-02 13:15:54 UTC
Moving this bug to verified. Metadata checksum mismatch will be dealt in different bug.

Comment 27 errata-xmlrpc 2014-02-25 07:35:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html