Bug 989906 - Dist-geo-rep : imaster in cascaded geo-rep fails to do first xsync crawl and consequently fail to sync files to level2 slave
Dist-geo-rep : imaster in cascaded geo-rep fails to do first xsync crawl and ...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Venky Shankar
Vijaykumar Koppad
:
Depends On:
Blocks: 990900 996371
  Show dependency treegraph
 
Reported: 2013-07-30 03:08 EDT by Vijaykumar Koppad
Modified: 2014-08-24 20:50 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.19rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 990900 996371 (view as bug list)
Environment:
Last Closed: 2013-09-23 18:38:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vijaykumar Koppad 2013-07-30 03:08:57 EDT
Description of problem: Intermediate master geo-rep in cascaded geo-rep setup fails to do first xsync crawl  and consequently fails to sync files which were created before starting geo-rep at intermediate master to level2 slave. 


Version-Release number of selected component (if applicable):3.4.0.13rhs-1.el6rhs.x86_64



How reproducible: Didn't try to reproduce 


Steps to Reproduce:
1.Create a geo-rep relationship  between master(DIST_REP) and imaster(DIST_REP) and imaster and slave(DIST) 
2.create some data on master,
3.After the creation of data , start geo-rep session between master and imaster , and let wait for the data to sync to imaster
4. After the completion of the sync to imaster, start geo-rep session between imaster and slave.
5.Check if it syncs data to slave.

Actual results: imaster fails to sync data to the slave through first xsync crawl. 


Expected results: imaster should sync files through first xsync crawl


Additional info:
Logs of imaster geo-rep session
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-07-30 11:55:04.250494] I [monitor(monitor):81:set_state] Monitor: new state: Initializing...
[2013-07-30 11:55:04.255507] I [monitor(monitor):129:monitor] Monitor: ----------------------------------------------
--------------
[2013-07-30 11:55:04.256043] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-07-30 11:55:04.501827] I [gsyncd(/bricks/imastervol3):501:main_i] <top>: syncing: gluster://localhost:imastervol -> ssh://root@10.70.37.210:gluster://localhost:slavevol
[2013-07-30 11:55:07.178581] I [master(/bricks/imastervol3):60:gmaster_builder] <top>: setting up xsync change detection mode
[2013-07-30 11:55:07.182302] I [master(/bricks/imastervol3):60:gmaster_builder] <top>: setting up changelog change detection mode
[2013-07-30 11:55:07.185750] I [master(/bricks/imastervol3):977:register] _GMaster: xsync temp directory: /var/run/gl uster/imastervol/ssh%3A%2F%2Froot%4010.70.37.40%3Agluster%3A%2F%2F127.0.0.1%3Aslavevol/48692526fa955f225b86348c2f162c
1c/xsync
[2013-07-30 11:55:07.246157] I [master(/bricks/imastervol3):468:crawlwrap] _GMaster: crawl interval: 60 seconds
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Observations: 
1. From above logs, it tried to set tp xsync change detection mode and then changed to changelog
2. If you observe some the backend changelogs after the sync was complete to from master to imaster, there are many changelogs which have this entry 
"M00000000-0000-0000-0000-000000000001" , which is we are modifying root for some reason , which might be affecting the imaster geo-rep to do first xscyn crawl. 
3. Hope this info helps.
Comment 3 Vijaykumar Koppad 2013-08-05 06:36:21 EDT
After starting geo-rep session between imaster and slave, it doesn't start xsync crawl , instead it tries to process some changelogs from .processing and logs too many Rsync [errcode: 23]

Tried on glusterfs-3.4.0.15rhs
Comment 4 Amar Tumballi 2013-08-05 15:22:28 EDT
Can it be because of performance degradation we saw with this build? need a round of testing after that is fixed?
Comment 5 Venky Shankar 2013-08-08 00:52:45 EDT
(In reply to Amar Tumballi from comment #4)
> Can it be because of performance degradation we saw with this build? need a
> round of testing after that is fixed?

Nope it's not. With the current logic there are possibilities to miss updates. This is a bug which needs to be fixed.
Comment 7 Vijaykumar Koppad 2013-08-19 04:46:43 EDT
verified on glusterfs-3.4.0.20rhs-2.el6rhs.x86_64,
Comment 8 Scott Haines 2013-09-23 18:38:48 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html
Comment 9 Scott Haines 2013-09-23 18:41:30 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.