Description of problem: Brand new 2 brick replica volume, geo-replication set up before any data went into the volume. rsync'd about 40GB of data from a range of small (1k) to large (800MB) files. I waited for geo-replication to finish and then ran an 'rsync --update' to catch some files that had changed on the source, geo-replication went into Faulty mode. Error log from gluster brick server: [2015-01-14 01:05:41.227730] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------ [2015-01-14 01:05:41.227941] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker [2015-01-14 01:05:41.329799] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining... [2015-01-14 01:05:41.330574] I [gsyncd(/export/sitedata):633:main_i] <top>: syncing: gluster://localhost:ds0 -> ssh://root@gaige:gluster://localhost:geo-repl [2015-01-14 01:05:45.691020] I [master(/export/sitedata):75:gmaster_builder] <top>: setting up xsync change detection mode [2015-01-14 01:05:45.691421] I [master(/export/sitedata):403:__init__] _GMaster: using 'rsync' as the sync engine [2015-01-14 01:05:45.692125] I [master(/export/sitedata):75:gmaster_builder] <top>: setting up changelog change detection mode [2015-01-14 01:05:45.692292] I [master(/export/sitedata):403:__init__] _GMaster: using 'rsync' as the sync engine [2015-01-14 01:05:45.692955] I [master(/export/sitedata):75:gmaster_builder] <top>: setting up changeloghistory change detection mode [2015-01-14 01:05:45.693108] I [master(/export/sitedata):403:__init__] _GMaster: using 'rsync' as the sync engine [2015-01-14 01:05:45.703636] I [master(/export/sitedata):1301:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/ds0/ssh%3A%2F%2Froot%4010.42.0.8%3Agluster%3A%2F%2F127.0.0.1%3Ageo-repl/669c18441eb6dec3d3bb2994485b6206/xsync [2015-01-14 01:05:45.703836] I [resource(/export/sitedata):1318:service_loop] GLUSTER: Register time: 1421157945 [2015-01-14 01:05:45.721772] I [master(/export/sitedata):480:crawlwrap] _GMaster: primary master with volume id 88bb6b5d-cc89-47cf-a74f-1a9c17f194d7 ... [2015-01-14 01:05:45.731245] I [master(/export/sitedata):491:crawlwrap] _GMaster: crawl interval: 1 seconds [2015-01-14 01:05:45.747462] I [master(/export/sitedata):1216:crawl] _GMaster: starting history crawl... turns: 1, stime: (1421156284, 0) [2015-01-14 01:05:45.749725] I [master(/export/sitedata):1245:crawl] _GMaster: slave's time: (1421156284, 0) [2015-01-14 01:05:48.813282] E [repce(/export/sitedata):207:__call__] RepceClient: call 20573:139902371477312:1421157947.51 (entry_ops) failed on peer with OSError [2015-01-14 01:05:48.813823] E [syncdutils(/export/sitedata):270:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main main_i() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 1325, in service_loop g3.crawlwrap(oneshot=True) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 529, in crawlwrap self.crawl(no_stime_update=no_stime_update) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1256, in crawl self.process(changes) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 946, in process self.process_change(change, done, retry) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 910, in process_change self.slave.server.entry_ops(entries) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 16] Device or resource busy [2015-01-14 01:05:48.815470] I [syncdutils(/export/sitedata):214:finalize] <top>: exiting. [2015-01-14 01:05:48.818737] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2015-01-14 01:05:48.819194] I [syncdutils(agent):214:finalize] <top>: exiting. [2015-01-14 01:05:49.695061] I [monitor(monitor):222:monitor] Monitor: worker(/export/sitedata) died in startup phase Error log from geo-replication slave: [2015-01-14 01:07:07.787431] I [gsyncd(slave):633:main_i] <top>: syncing: gluster://localhost:geo-repl [2015-01-14 01:07:08.919622] I [resource(slave):761:service_loop] GLUSTER: slave listening [2015-01-14 01:07:11.168715] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 658, in entry_ops errno_wrap(os.rename, [entry, en], [ENOENT, EEXIST]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 470, in errno_wrap return call(*arg) OSError: [Errno 16] Device or resource busy [2015-01-14 01:07:11.176639] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. [2015-01-14 01:07:11.177026] I [syncdutils(slave):214:finalize] <top>: exiting. Version-Release number of selected component (if applicable): glusterfs-server 3.6.1-ubuntu1~trusty1 from the Ubuntu launchpad PPA How reproducible: Rsync some data, change some files from the source and rsync --update Steps to Reproduce: 1. rsync files from an external location into gluster 2. change some files in the external location, rsync --update 3. geo-replication failed Actual results: geo-replication fails Expected results: geo-replication works Additional info: This was a fresh gluster 3.6.1 installation, no upgrade was performed and the volume was empty when geo-replication was setup.
This error comes if Rebalance is happening in Slave nodes. DHT layer holds lock during Rebalance, hence rename fails with this error. Since Geo-rep monitor restarts the worker dies, this error logged again. Once the Rebalance complete, Geo-rep become normal and continue syncing data from Master to Slave. We will enhance Geo-rep to handle this situation smoothly.
GlusterFS-3.6 is nearing its End-Of-Life, only important security bugs still make a chance on getting fixed. mainline bug is available BZ 1258379. If this needs to get fixed in 3.7 or 3.8 this bug should get cloned.