Bug 989906 - Dist-geo-rep : imaster in cascaded geo-rep fails to do first xsync crawl and consequently fail to sync files to level2 slave
Summary: Dist-geo-rep : imaster in cascaded geo-rep fails to do first xsync crawl and ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Venky Shankar
QA Contact: Vijaykumar Koppad
URL:
Whiteboard:
Depends On:
Blocks: 990900 996371
TreeView+ depends on / blocked
 
Reported: 2013-07-30 07:08 UTC by Vijaykumar Koppad
Modified: 2014-08-25 00:50 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0.19rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 990900 996371 (view as bug list)
Environment:
Last Closed: 2013-09-23 22:38:48 UTC
Embargoed:


Attachments (Terms of Use)

Description Vijaykumar Koppad 2013-07-30 07:08:57 UTC
Description of problem: Intermediate master geo-rep in cascaded geo-rep setup fails to do first xsync crawl  and consequently fails to sync files which were created before starting geo-rep at intermediate master to level2 slave. 


Version-Release number of selected component (if applicable):3.4.0.13rhs-1.el6rhs.x86_64



How reproducible: Didn't try to reproduce 


Steps to Reproduce:
1.Create a geo-rep relationship  between master(DIST_REP) and imaster(DIST_REP) and imaster and slave(DIST) 
2.create some data on master,
3.After the creation of data , start geo-rep session between master and imaster , and let wait for the data to sync to imaster
4. After the completion of the sync to imaster, start geo-rep session between imaster and slave.
5.Check if it syncs data to slave.

Actual results: imaster fails to sync data to the slave through first xsync crawl. 


Expected results: imaster should sync files through first xsync crawl


Additional info:
Logs of imaster geo-rep session
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-07-30 11:55:04.250494] I [monitor(monitor):81:set_state] Monitor: new state: Initializing...
[2013-07-30 11:55:04.255507] I [monitor(monitor):129:monitor] Monitor: ----------------------------------------------
--------------
[2013-07-30 11:55:04.256043] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-07-30 11:55:04.501827] I [gsyncd(/bricks/imastervol3):501:main_i] <top>: syncing: gluster://localhost:imastervol -> ssh://root.37.210:gluster://localhost:slavevol
[2013-07-30 11:55:07.178581] I [master(/bricks/imastervol3):60:gmaster_builder] <top>: setting up xsync change detection mode
[2013-07-30 11:55:07.182302] I [master(/bricks/imastervol3):60:gmaster_builder] <top>: setting up changelog change detection mode
[2013-07-30 11:55:07.185750] I [master(/bricks/imastervol3):977:register] _GMaster: xsync temp directory: /var/run/gl uster/imastervol/ssh%3A%2F%2Froot%4010.70.37.40%3Agluster%3A%2F%2F127.0.0.1%3Aslavevol/48692526fa955f225b86348c2f162c
1c/xsync
[2013-07-30 11:55:07.246157] I [master(/bricks/imastervol3):468:crawlwrap] _GMaster: crawl interval: 60 seconds
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Observations: 
1. From above logs, it tried to set tp xsync change detection mode and then changed to changelog
2. If you observe some the backend changelogs after the sync was complete to from master to imaster, there are many changelogs which have this entry 
"M00000000-0000-0000-0000-000000000001" , which is we are modifying root for some reason , which might be affecting the imaster geo-rep to do first xscyn crawl. 
3. Hope this info helps.

Comment 3 Vijaykumar Koppad 2013-08-05 10:36:21 UTC
After starting geo-rep session between imaster and slave, it doesn't start xsync crawl , instead it tries to process some changelogs from .processing and logs too many Rsync [errcode: 23]

Tried on glusterfs-3.4.0.15rhs

Comment 4 Amar Tumballi 2013-08-05 19:22:28 UTC
Can it be because of performance degradation we saw with this build? need a round of testing after that is fixed?

Comment 5 Venky Shankar 2013-08-08 04:52:45 UTC
(In reply to Amar Tumballi from comment #4)
> Can it be because of performance degradation we saw with this build? need a
> round of testing after that is fixed?

Nope it's not. With the current logic there are possibilities to miss updates. This is a bug which needs to be fixed.

Comment 7 Vijaykumar Koppad 2013-08-19 08:46:43 UTC
verified on glusterfs-3.4.0.20rhs-2.el6rhs.x86_64,

Comment 8 Scott Haines 2013-09-23 22:38:48 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 9 Scott Haines 2013-09-23 22:41:30 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.