Bug 1127102

Summary: [Dist-geo-rep] changelog disable during hardlink creation resulted in missing file on slave.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED DUPLICATE QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: asriram, avishwan, chrisw, csaba, david.macdonald, khiremat, nlevinki, rallan, sankarshan, smohan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-15 11:40:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1129222    
Bug Blocks:    
Attachments:
Description Flags
geo-rep automation log none

Description Vijaykumar Koppad 2014-08-06 07:29:10 UTC
Description of problem: changelog disable during hardlink creation results in failure to sync few hardlinks to slave.
the files which were failed to sync didn't have entry in XSYNC-CHANGELOG.


Version-Release number of selected component (if applicable): glusterfs-3.6.0.27-1.el6rhs


How reproducible: didn't try to reproduce 


Steps to Reproduce:
1. create and start a geo-rep relationship between master(6x2) and slave(6x2) 
2. create data on master using the command "crefi -T 5 -n 5 --multi -b 10 -d 10 --random --max=10K --min=1K /mnt/master"
3. let it sync to slave.
4. start creating hardlink to the created data using the command "crefi -T 5 -n 5 --multi -b 10 -d 10 --random --max=10K --min=1K --fop=hardlink /mnt/master"
5. parallelly disable changelog.
4. check the logs and also files on master and slave after the syncing has completed


Actual results:  Few hardlink fail to sync to slave and those files don't have entry in the xsync changelog


Expected results: none of the files should fail to sync to slave.


Additional info:
log snippet from geo-rep log when changelog was disabled 
===============================================================================
[2014-08-05 16:11:09.57487] I [syncdutils(/bricks/brick1/b1):214:finalize] <top>: exiting.
[2014-08-05 16:11:09.58455] I [syncdutils(/bricks/brick3/b9):214:finalize] <top>: exiting.
[2014-08-05 16:11:09.58387] E [syncdutils(/bricks/brick2/b5):246:log_raise_exception] <top>: connection to peer is broken
[2014-08-05 16:11:09.64537] I [syncdutils(/bricks/brick2/b5):214:finalize] <top>: exiting.
[2014-08-05 16:11:09.66345] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2014-08-05 16:11:09.66819] I [syncdutils(agent):214:finalize] <top>: exiting.
[2014-08-05 16:11:09.66884] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2014-08-05 16:11:09.67362] I [syncdutils(agent):214:finalize] <top>: exiting.
[2014-08-05 16:11:09.81029] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2014-08-05 16:11:09.81542] I [syncdutils(agent):214:finalize] <top>: exiting.
[2014-08-05 16:11:09.97275] I [monitor(monitor):109:set_state] Monitor: new state: faulty
[2014-08-05 16:11:16.479928] I [monitor(monitor):296:distribute] <top>: slave bricks: [{'host': '10.70.43.137', 'dir': '/bricks/brick1/b1'}, {'host': '10.70.43.167', 'dir': '/bricks/brick1/b2'}, {'host': '10.70.42.169', 'dir': '/bricks/br
ick1/b3'}, {'host': '10.70.42.250', 'dir': '/bricks/brick1/b4'}, {'host': '10.70.43.137', 'dir': '/bricks/brick2/b5'}, {'host': '10.70.43.167', 'dir': '/bricks/brick2/b6'}, {'host': '10.70.42.169', 'dir': '/bricks/brick2/b7'}, {'host': '1
0.70.42.250', 'dir': '/bricks/brick2/b8'}, {'host': '10.70.43.137', 'dir': '/bricks/brick3/b9'}, {'host': '10.70.43.167', 'dir': '/bricks/brick3/b10'}, {'host': '10.70.42.169', 'dir': '/bricks/brick3/b11'}, {'host': '10.70.42.250', 'dir':
 '/bricks/brick3/b12'}]
[2014-08-05 16:11:16.483553] I [monitor(monitor):316:distribute] <top>: worker specs: [('/bricks/brick1/b1', 'ssh://root.42.169:gluster://localhost:slave'), ('/bricks/brick2/b5', 'ssh://root.42.169:gluster://localhost:slave'),
 ('/bricks/brick3/b9', 'ssh://root.42.169:gluster://localhost:slave')]
[2014-08-05 16:11:16.486087] I [monitor(monitor):109:set_state] Monitor: new state: Initializing...
[2014-08-05 16:11:16.488512] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------
[2014-08-05 16:11:16.489475] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker
[2014-08-05 16:11:16.496471] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------
[2014-08-05 16:11:16.497401] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker
[2014-08-05 16:11:16.506869] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------
[2014-08-05 16:11:16.508217] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker
[2014-08-05 16:11:17.143851] I [gsyncd(/bricks/brick3/b9):633:main_i] <top>: syncing: gluster://localhost:master -> ssh://root.42.169:gluster://localhost:slave
[2014-08-05 16:11:17.150867] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining...
[2014-08-05 16:11:17.151503] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining...
[2014-08-05 16:11:17.156764] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining...
[2014-08-05 16:11:17.157335] I [gsyncd(/bricks/brick1/b1):633:main_i] <top>: syncing: gluster://localhost:master -> ssh://root.42.169:gluster://localhost:slave
[2014-08-05 16:11:17.162847] I [gsyncd(/bricks/brick2/b5):633:main_i] <top>: syncing: gluster://localhost:master -> ssh://root.42.169:gluster://localhost:slave
[2014-08-05 16:11:20.84049] I [master(/bricks/brick1/b1):75:gmaster_builder] <top>: setting up xsync change detection mode
[2014-08-05 16:11:20.84460] I [master(/bricks/brick1/b1):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:20.85959] I [master(/bricks/brick1/b1):75:gmaster_builder] <top>: setting up changelog change detection mode
[2014-08-05 16:11:20.86169] I [master(/bricks/brick1/b1):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:20.87346] I [master(/bricks/brick1/b1):75:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2014-08-05 16:11:20.87590] I [master(/bricks/brick1/b1):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:20.122971] I [master(/bricks/brick1/b1):1270:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/2b534a876cb5decdb9fec2c9c5b4b998/
xsync
[2014-08-05 16:11:20.123234] I [resource(/bricks/brick1/b1):1300:service_loop] GLUSTER: Register time: 1407235280
[2014-08-05 16:11:20.210582] I [master(/bricks/brick1/b1):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:20.220850] I [master(/bricks/brick1/b1):475:crawlwrap] _GMaster: crawl interval: 1 seconds
[2014-08-05 16:11:20.261904] I [master(/bricks/brick1/b1):1179:crawl] _GMaster: starting history crawl... turns: 1
[2014-08-05 16:11:20.266037] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 51, in history
    num_parallel)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 94, in cl_history_changelog
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2014-08-05 16:11:20.267716] E [repce(/bricks/brick1/b1):207:__call__] RepceClient: call 15712:139954255877888:1407235280.26 (history) failed on peer with ChangelogException
[2014-08-05 16:11:20.268917] I [resource(/bricks/brick1/b1):1314:service_loop] GLUSTER: Changelog history crawl failed, fallback to xsync: 2 - No such file or directory
[2014-08-05 16:11:20.281660] I [master(/bricks/brick1/b1):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:20.326155] I [master(/bricks/brick1/b1):475:crawlwrap] _GMaster: crawl interval: 60 seconds
[2014-08-05 16:11:20.356704] I [master(/bricks/brick1/b1):1302:crawl] _GMaster: starting hybrid crawl...
[2014-08-05 16:11:20.363696] I [master(/bricks/brick1/b1):1316:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/2b534a876cb5decdb9fec2c9c5b4b998/xsync/XSYNC-CHANGELOG.1407235280
[2014-08-05 16:11:20.372348] I [master(/bricks/brick1/b1):1310:crawl] _GMaster: finished hybrid crawl syncing, endtime: 1407235280
[2014-08-05 16:11:20.375163] I [master(/bricks/brick2/b5):75:gmaster_builder] <top>: setting up xsync change detection mode
[2014-08-05 16:11:20.375906] I [master(/bricks/brick2/b5):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:20.383761] I [master(/bricks/brick2/b5):75:gmaster_builder] <top>: setting up changelog change detection mode
[2014-08-05 16:11:20.384201] I [master(/bricks/brick2/b5):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:20.389640] I [master(/bricks/brick2/b5):75:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2014-08-05 16:11:20.390093] I [master(/bricks/brick2/b5):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:20.394239] I [master(/bricks/brick1/b1):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:20.409082] I [master(/bricks/brick1/b1):475:crawlwrap] _GMaster: crawl interval: 3 seconds
[2014-08-05 16:11:20.478893] I [master(/bricks/brick2/b5):1270:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/60e13929786dd36f9bf7670d2ec67396/xsync
[2014-08-05 16:11:20.479387] I [resource(/bricks/brick2/b5):1300:service_loop] GLUSTER: Register time: 1407235280
[2014-08-05 16:11:20.593775] I [master(/bricks/brick3/b9):75:gmaster_builder] <top>: setting up xsync change detection mode
[2014-08-05 16:11:20.594602] I [master(/bricks/brick3/b9):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:20.599872] I [master(/bricks/brick3/b9):75:gmaster_builder] <top>: setting up changelog change detection mode
[2014-08-05 16:11:20.600290] I [master(/bricks/brick3/b9):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:20.602926] I [master(/bricks/brick3/b9):75:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2014-08-05 16:11:20.603346] I [master(/bricks/brick3/b9):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:20.688090] I [master(/bricks/brick2/b5):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:20.707953] I [master(/bricks/brick3/b9):1270:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/1f8a8e6b046b00c682675ebf692f5968/xsync
[2014-08-05 16:11:20.708560] I [resource(/bricks/brick3/b9):1300:service_loop] GLUSTER: Register time: 1407235280
[2014-08-05 16:11:20.740453] I [master(/bricks/brick2/b5):475:crawlwrap] _GMaster: crawl interval: 1 seconds
[2014-08-05 16:11:20.793405] I [master(/bricks/brick2/b5):1179:crawl] _GMaster: starting history crawl... turns: 1
[2014-08-05 16:11:20.797069] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 51, in history
    num_parallel)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 94, in cl_history_changelog
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2014-08-05 16:11:20.800934] E [repce(/bricks/brick2/b5):207:__call__] RepceClient: call 15708:140029043300096:1407235280.79 (history) failed on peer with ChangelogException
[2014-08-05 16:11:20.801637] I [resource(/bricks/brick2/b5):1314:service_loop] GLUSTER: Changelog history crawl failed, fallback to xsync: 2 - No such file or directory
[2014-08-05 16:11:20.827630] I [master(/bricks/brick2/b5):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:20.855560] I [master(/bricks/brick2/b5):475:crawlwrap] _GMaster: crawl interval: 60 seconds
[2014-08-05 16:11:20.870126] I [master(/bricks/brick2/b5):1302:crawl] _GMaster: starting hybrid crawl...
[2014-08-05 16:11:20.874620] I [master(/bricks/brick2/b5):1316:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/60e13929786dd36f9bf7670d2ec67396/xsync/XSYNC-CHANGELOG.1407235280
[2014-08-05 16:11:20.902592] I [master(/bricks/brick2/b5):1310:crawl] _GMaster: finished hybrid crawl syncing, endtime: 1407235280
[2014-08-05 16:11:20.922286] I [master(/bricks/brick2/b5):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:20.921961] I [master(/bricks/brick3/b9):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:20.935843] I [master(/bricks/brick2/b5):475:crawlwrap] _GMaster: crawl interval: 3 seconds
[2014-08-05 16:11:20.967562] I [master(/bricks/brick3/b9):475:crawlwrap] _GMaster: crawl interval: 1 seconds
[2014-08-05 16:11:20.992078] I [master(/bricks/brick3/b9):1179:crawl] _GMaster: starting history crawl... turns: 1
[2014-08-05 16:11:20.996417] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 51, in history
    num_parallel)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 94, in cl_history_changelog
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2014-08-05 16:11:21.6376] E [repce(/bricks/brick3/b9):207:__call__] RepceClient: call 15710:140109737068288:1407235280.99 (history) failed on peer with ChangelogException
[2014-08-05 16:11:21.6899] I [resource(/bricks/brick3/b9):1314:service_loop] GLUSTER: Changelog history crawl failed, fallback to xsync: 2 - No such file or directory
[2014-08-05 16:11:21.27549] I [master(/bricks/brick3/b9):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:21.104799] I [master(/bricks/brick3/b9):475:crawlwrap] _GMaster: crawl interval: 60 seconds
[2014-08-05 16:11:21.273367] I [master(/bricks/brick3/b9):1302:crawl] _GMaster: starting hybrid crawl...
[2014-08-05 16:11:21.364648] I [master(/bricks/brick3/b9):1316:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/1f8a8e6b046b00c682675ebf692f5968/xsync/XSYNC-CHANGELOG.1407235281
[2014-08-05 16:11:21.576427] I [master(/bricks/brick3/b9):1310:crawl] _GMaster: finished hybrid crawl syncing, endtime: 1407235280
[2014-08-05 16:11:21.594685] I [master(/bricks/brick3/b9):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:21.671987] I [master(/bricks/brick3/b9):475:crawlwrap] _GMaster: crawl interval: 3 seconds
[2014-08-05 16:11:39.111709] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 41, in scan
    return Changes.cl_scan()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_scan
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 111] Connection refused
[2014-08-05 16:11:39.113528] E [repce(/bricks/brick1/b1):207:__call__] RepceClient: call 15712:139954255877888:1407235299.11 (scan) failed on peer with ChangelogException
[2014-08-05 16:11:39.114054] I [resource(/bricks/brick1/b1):1339:service_loop] GLUSTER: Changelog crawl failed, fallback to xsync
[2014-08-05 16:11:39.124456] I [master(/bricks/brick1/b1):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:39.210314] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 41, in scan
    return Changes.cl_scan()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_scan
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 111] Connection refused
[2014-08-05 16:11:39.211803] E [repce(/bricks/brick3/b9):207:__call__] RepceClient: call 15710:140109737068288:1407235299.21 (scan) failed on peer with ChangelogException
[2014-08-05 16:11:39.212310] I [resource(/bricks/brick3/b9):1339:service_loop] GLUSTER: Changelog crawl failed, fallback to xsync
[2014-08-05 16:11:39.222659] I [master(/bricks/brick3/b9):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:39.270149] I [master(/bricks/brick1/b1):475:crawlwrap] _GMaster: crawl interval: 60 seconds
[2014-08-05 16:11:39.322894] I [master(/bricks/brick3/b9):475:crawlwrap] _GMaster: crawl interval: 60 seconds
[2014-08-05 16:11:39.454282] I [master(/bricks/brick1/b1):1302:crawl] _GMaster: starting hybrid crawl...
[2014-08-05 16:11:39.525655] I [master(/bricks/brick3/b9):1302:crawl] _GMaster: starting hybrid crawl...
[2014-08-05 16:11:40.621776] I [master(/bricks/brick1/b1):1316:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/2b534a876cb5decdb9fec2c9c5b4b998/xsync/XSYNC-CHANGELOG.1407235299
[2014-08-05 16:11:40.668409] I [master(/bricks/brick3/b9):1316:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/1f8a8e6b046b00c682675ebf692f5968/xsync/XSYNC-CHANGELOG.1407235299
[2014-08-05 16:11:41.282250] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 41, in scan
    return Changes.cl_scan()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_scan
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 111] Connection refused
[2014-08-05 16:11:41.286304] E [repce(/bricks/brick2/b5):207:__call__] RepceClient: call 15708:140029043300096:1407235301.28 (scan) failed on peer with ChangelogException
[2014-08-05 16:11:41.286777] I [resource(/bricks/brick2/b5):1339:service_loop] GLUSTER: Changelog crawl failed, fallback to xsync
[2014-08-05 16:11:41.302299] I [master(/bricks/brick2/b5):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:11:41.324421] I [master(/bricks/brick2/b5):475:crawlwrap] _GMaster: crawl interval: 60 seconds
[2014-08-05 16:11:41.370280] I [master(/bricks/brick2/b5):1302:crawl] _GMaster: starting hybrid crawl...
[2014-08-05 16:11:42.415246] I [master(/bricks/brick2/b5):1316:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/60e13929786dd36f9bf7670d2ec67396/xsync/XSYNC-CHANGELOG.1407235301
[2014-08-05 16:11:43.64141] E [repce(/bricks/brick1/b1):207:__call__] RepceClient: call 15712:139954255877888:1407235302.2 (entry_ops) failed on peer with OSError
[2014-08-05 16:11:43.64735] E [syncdutils(/bricks/brick1/b1):270:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1340, in service_loop
    g1.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 513, in crawlwrap
    self.crawl(no_stime_update=no_stime_update)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1317, in crawl
    self.process([item[1]], 0)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 910, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 874, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 22] Invalid argument
[2014-08-05 16:11:43.69366] I [syncdutils(/bricks/brick1/b1):214:finalize] <top>: exiting.
[2014-08-05 16:11:43.76734] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2014-08-05 16:11:43.77171] I [syncdutils(agent):214:finalize] <top>: exiting.
[2014-08-05 16:11:43.112714] I [monitor(monitor):222:monitor] Monitor: worker(/bricks/brick1/b1) died in startup phase
[2014-08-05 16:11:43.114411] I [monitor(monitor):109:set_state] Monitor: new state: faulty
[2014-08-05 16:11:48.110544] E [repce(/bricks/brick2/b5):207:__call__] RepceClient: call 15708:140029043300096:1407235303.35 (entry_ops) failed on peer with OSError
[2014-08-05 16:11:48.111195] E [syncdutils(/bricks/brick2/b5):270:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1340, in service_loop
    g1.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 513, in crawlwrap
    self.crawl(no_stime_update=no_stime_update)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1317, in crawl
    self.process([item[1]], 0)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 910, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 874, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 22] Invalid argument
[2014-08-05 16:11:48.114728] I [syncdutils(/bricks/brick2/b5):214:finalize] <top>: exiting.
[2014-08-05 16:11:48.119008] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2014-08-05 16:11:48.119585] I [syncdutils(agent):214:finalize] <top>: exiting.
[2014-08-05 16:11:48.407221] I [monitor(monitor):222:monitor] Monitor: worker(/bricks/brick2/b5) died in startup phase
[2014-08-05 16:11:53.127758] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------
[2014-08-05 16:11:53.128494] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker
[2014-08-05 16:11:53.464647] I [gsyncd(/bricks/brick1/b1):633:main_i] <top>: syncing: gluster://localhost:master -> ssh://root.42.169:gluster://localhost:slave
[2014-08-05 16:11:53.694895] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining...
[2014-08-05 16:11:56.779788] I [master(/bricks/brick1/b1):75:gmaster_builder] <top>: setting up xsync change detection mode
[2014-08-05 16:11:56.780538] I [master(/bricks/brick1/b1):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:56.783061] I [master(/bricks/brick1/b1):75:gmaster_builder] <top>: setting up changelog change detection mode
[2014-08-05 16:11:56.783626] I [master(/bricks/brick1/b1):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:56.789870] I [master(/bricks/brick1/b1):75:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2014-08-05 16:11:56.790400] I [master(/bricks/brick1/b1):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:11:58.418648] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------
[2014-08-05 16:11:58.419657] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker
[2014-08-05 16:11:58.894506] I [gsyncd(/bricks/brick2/b5):633:main_i] <top>: syncing: gluster://localhost:master -> ssh://root.42.169:gluster://localhost:slave
[2014-08-05 16:11:59.143348] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining...
[2014-08-05 16:12:02.421999] I [master(/bricks/brick2/b5):75:gmaster_builder] <top>: setting up xsync change detection mode
[2014-08-05 16:12:02.422900] I [master(/bricks/brick2/b5):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:12:02.429439] I [master(/bricks/brick2/b5):75:gmaster_builder] <top>: setting up changelog change detection mode
[2014-08-05 16:12:02.430078] I [master(/bricks/brick2/b5):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:12:02.433237] I [master(/bricks/brick2/b5):75:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2014-08-05 16:12:02.433785] I [master(/bricks/brick2/b5):399:__init__] _GMaster: using 'rsync' as the sync engine
[2014-08-05 16:12:06.932548] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 38, in register
    return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 39, in cl_register
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2014-08-05 16:12:06.935751] E [repce(/bricks/brick1/b1):207:__call__] RepceClient: call 15944:139721398052608:1407235316.8 (register) failed on peer with ChangelogException
[2014-08-05 16:12:06.936290] I [resource(/bricks/brick1/b1):1297:service_loop] GLUSTER: Changelog register failed, fallback to xsync
[2014-08-05 16:12:06.937048] I [master(/bricks/brick1/b1):1270:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/2b534a876cb5decdb9fec2c9c5b4b998/xsync
[2014-08-05 16:12:06.938503] I [resource(/bricks/brick1/b1):1300:service_loop] GLUSTER: Register time: 1407235326
[2014-08-05 16:12:07.80060] I [master(/bricks/brick1/b1):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:12:07.291779] I [master(/bricks/brick1/b1):475:crawlwrap] _GMaster: crawl interval: 60 seconds
[2014-08-05 16:12:07.604331] I [master(/bricks/brick1/b1):1302:crawl] _GMaster: starting hybrid crawl...
[2014-08-05 16:12:07.651225] I [master(/bricks/brick1/b1):1316:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/2b534a876cb5decdb9fec2c9c5b4b998/xsync/XSYNC-CHANGELOG.1407235327
[2014-08-05 16:12:07.780066] I [master(/bricks/brick1/b1):1310:crawl] _GMaster: finished hybrid crawl syncing, endtime: 1407235326
[2014-08-05 16:12:07.793001] I [master(/bricks/brick1/b1):464:crawlwrap] _GMaster: primary master with volume id fef0b49e-4b82-46be-8744-cc9e576a99d0 ...
[2014-08-05 16:12:07.895845] I [master(/bricks/brick1/b1):475:crawlwrap] _GMaster: crawl interval: 60 seconds
[2014-08-05 16:12:08.173492] I [master(/bricks/brick1/b1):1302:crawl] _GMaster: starting hybrid crawl...
[2014-08-05 16:12:08.237772] I [master(/bricks/brick1/b1):1316:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/master/ssh%3A%2F%2Froot%4010.70.43.137%3Agluster%3A%2F%2F127.0.0.1%3Aslave/2b534a876cb5decdb9fec2c9c5b4b998/xsync/XSYNC-CHANGELOG.1407235328
[2014-08-05 16:12:10.807106] I [master(/bricks/brick1/b1):1312:crawl] _GMaster: finished hybrid crawl syncing
[2014-08-05 16:12:11.689755] I [master(/bricks/brick3/b9):1312:crawl] _GMaster: finished hybrid crawl syncing
[2014-08-05 16:12:12.498573] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 38, in register
    return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 39, in cl_register
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2014-08-05 16:12:12.501410] E [repce(/bricks/brick2/b5):207:__call__] RepceClient: call 15987:140008777406208:1407235322.44 (register) failed on peer with ChangelogException

===============================================================================

Comment 2 Vijaykumar Koppad 2014-08-06 07:44:58 UTC
sosreport of master and slave nodes @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1127102/

Comment 3 Vijaykumar Koppad 2014-08-06 07:47:20 UTC
Created attachment 924379 [details]
geo-rep automation log

Comment 5 Shalaka 2014-08-19 06:41:38 UTC
Please add doc text for this known issue.

Comment 6 Aravinda VK 2014-08-20 07:42:37 UTC
Upstream patch: http://review.gluster.org/#/c/8439/
Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/31104/

Comment 8 Aravinda VK 2014-08-22 08:45:12 UTC
Dependent bug(1129222) is MODIFIED, and no other changes required to fix this bug.

Comment 13 Mike McCune 2016-03-28 22:17:27 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions