Previously, setting the remote xtime would fail due to a Python backtrace. This made the Geo-replication worker process to restart with 'faulty' status.
With this fix, Python exceptions are not raised when setting remote xtime fails and the Geo-replication worker process works as expected.
Description of problem:
I installed release 51 and started the geo-rep process and I am seeing:
[2013-12-20 15:22:52.211688] W [master(/data/master/dp-vol):253:regjob] _GMaster: Rsync: .gfid/90c5b4fc-b7a3-4d73-a2ac-1421878a8e3f [errcode: 23]
[2013-12-20 15:22:52.212217] W [master(/data/master/dp-vol):883:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1387578655
[2013-12-20 15:29:34.180546] I [master(/data/master/dp-vol):1138:crawl] _GMaster: processing xsync changelog /var/run/gluster/dp-vol/ssh%3A%2F%2Froot%4010.145.74.241%3Agluster%3A%2F%2F127.0.0.1%3Adp-vol1/c0e8e929978b0e1a0fa2511da0bdc805/xsync/XSYNC-CHANGELOG.1387578659
[2013-12-20 15:36:25.599109] E [repce(/data/master/dp-vol):188:__call__] RepceClient: call 8647:140093031302912:1387582585.55 (set_xtime_remote) failed on peer with OSError
[2013-12-20 15:36:25.599637] E [syncdutils(/data/master/dp-vol):207:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1156, in service_loop
g1.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 473, in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1142, in crawl
self.upd_stime(item[1][1], item[1][0])
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 890, in upd_stime
self.sendmark(path, stime)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 664, in sendmark
self.set_slave_xtime(path, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 149, in set_slave_xtime
self.slave.server.set_xtime_remote(path, self.uuid, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__
return self.ins(self.meth, *a)
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__
raise res
OSError: [Errno 2] No such file or directory
[2013-12-20 15:36:25.630029] I [syncdutils(/data/master/dp-vol):159:finalize] <top>: exiting.
[2013-12-20 15:36:25.648134] I [monitor(monitor):81:set_state] Monitor: new state: faulty
[2013-12-20 15:36:35.659291] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-12-20 15:36:35.659557] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-12-20 15:36:35.835516] I [gsyncd(/data/master/dp-vol):530:main_i] <top>: syncing: gluster://localhost:dp-vol ->ssh://root.intuit.net:gluster://localhost:dp-vol1
[2013-12-20 15:36:39.495806] I [master(/data/master/dp-vol):58:gmaster_builder] <top>: setting up xsync change detection mode
[2013-12-20 15:36:39.496217] I [master(/data/master/dp-vol):363:__init__] _GMaster: using 'rsync' as the sync engine
[2013-12-20 15:36:39.497653] I [master(/data/master/dp-vol):58:gmaster_builder] <top>: setting up changelog change detection mode
[2013-12-20 15:36:39.497888] I [master(/data/master/dp-vol):363:__init__] _GMaster: using 'rsync' as the sync engine
[2013-12-20 15:36:39.499078] I [master(/data/master/dp-vol):1108:register] _GMaster: xsync temp directory: /var/run/gluster/dp-vol/ssh
Comment 2Nagaprasad Sathyanarayana
2013-12-26 10:28:30 UTC
MZ,
This was once observed in Neependra's setup and in Intuit. It was not reproducible after that. It's observed in midst of a normal sync operation.
Comment 6M S Vishwanath Bhat
2014-01-13 06:33:49 UTC
I was not able to reproduce even after 2-3 tries. The behaviour was same with or without patch. Since I'm hitting the same issue, I will move to verified. Please re-open if seen again.
Tested in Version: glusterfs-3.4.0.53rhs-1.el6rhs.x86_64.rpm
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHEA-2014-0208.html
Description of problem: I installed release 51 and started the geo-rep process and I am seeing: [2013-12-20 15:22:52.211688] W [master(/data/master/dp-vol):253:regjob] _GMaster: Rsync: .gfid/90c5b4fc-b7a3-4d73-a2ac-1421878a8e3f [errcode: 23] [2013-12-20 15:22:52.212217] W [master(/data/master/dp-vol):883:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1387578655 [2013-12-20 15:29:34.180546] I [master(/data/master/dp-vol):1138:crawl] _GMaster: processing xsync changelog /var/run/gluster/dp-vol/ssh%3A%2F%2Froot%4010.145.74.241%3Agluster%3A%2F%2F127.0.0.1%3Adp-vol1/c0e8e929978b0e1a0fa2511da0bdc805/xsync/XSYNC-CHANGELOG.1387578659 [2013-12-20 15:36:25.599109] E [repce(/data/master/dp-vol):188:__call__] RepceClient: call 8647:140093031302912:1387582585.55 (set_xtime_remote) failed on peer with OSError [2013-12-20 15:36:25.599637] E [syncdutils(/data/master/dp-vol):207:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1156, in service_loop g1.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 473, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1142, in crawl self.upd_stime(item[1][1], item[1][0]) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 890, in upd_stime self.sendmark(path, stime) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 664, in sendmark self.set_slave_xtime(path, mark) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 149, in set_slave_xtime self.slave.server.set_xtime_remote(path, self.uuid, mark) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__ raise res OSError: [Errno 2] No such file or directory [2013-12-20 15:36:25.630029] I [syncdutils(/data/master/dp-vol):159:finalize] <top>: exiting. [2013-12-20 15:36:25.648134] I [monitor(monitor):81:set_state] Monitor: new state: faulty [2013-12-20 15:36:35.659291] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------ [2013-12-20 15:36:35.659557] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker [2013-12-20 15:36:35.835516] I [gsyncd(/data/master/dp-vol):530:main_i] <top>: syncing: gluster://localhost:dp-vol ->ssh://root.intuit.net:gluster://localhost:dp-vol1 [2013-12-20 15:36:39.495806] I [master(/data/master/dp-vol):58:gmaster_builder] <top>: setting up xsync change detection mode [2013-12-20 15:36:39.496217] I [master(/data/master/dp-vol):363:__init__] _GMaster: using 'rsync' as the sync engine [2013-12-20 15:36:39.497653] I [master(/data/master/dp-vol):58:gmaster_builder] <top>: setting up changelog change detection mode [2013-12-20 15:36:39.497888] I [master(/data/master/dp-vol):363:__init__] _GMaster: using 'rsync' as the sync engine [2013-12-20 15:36:39.499078] I [master(/data/master/dp-vol):1108:register] _GMaster: xsync temp directory: /var/run/gluster/dp-vol/ssh