Bug 1046604

Summary: geo-replication fails with OSError when setting remote xtime
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Aravinda VK <avishwan>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED ERRATA QA Contact: M S Vishwanath Bhat <vbhat>
Severity: medium Docs Contact:
Priority: high    
Version: unspecifiedCC: aavati, avishwan, csaba, dblack, grajaiya, mzywusko, nsathyan, psriniva, vagarwal, vshankar
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 2.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.53rhs Doc Type: Bug Fix
Doc Text:
Previously, setting the remote xtime would fail due to a Python backtrace. This made the Geo-replication worker process to restart with 'faulty' status. With this fix, Python exceptions are not raised when setting remote xtime fails and the Geo-replication worker process works as expected.
Story Points: ---
Clone Of:
: 1073844 (view as bug list) Environment:
Last Closed: 2014-02-25 08:13:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1073844    

Description Aravinda VK 2013-12-26 09:52:27 UTC
Description of problem:
I installed release 51 and started the geo-rep process and I am seeing:

[2013-12-20 15:22:52.211688] W [master(/data/master/dp-vol):253:regjob] _GMaster: Rsync: .gfid/90c5b4fc-b7a3-4d73-a2ac-1421878a8e3f [errcode: 23]
[2013-12-20 15:22:52.212217] W [master(/data/master/dp-vol):883:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1387578655
[2013-12-20 15:29:34.180546] I [master(/data/master/dp-vol):1138:crawl] _GMaster: processing xsync changelog /var/run/gluster/dp-vol/ssh%3A%2F%2Froot%4010.145.74.241%3Agluster%3A%2F%2F127.0.0.1%3Adp-vol1/c0e8e929978b0e1a0fa2511da0bdc805/xsync/XSYNC-CHANGELOG.1387578659
[2013-12-20 15:36:25.599109] E [repce(/data/master/dp-vol):188:__call__] RepceClient: call 8647:140093031302912:1387582585.55 (set_xtime_remote) failed on peer with OSError
[2013-12-20 15:36:25.599637] E [syncdutils(/data/master/dp-vol):207:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1156, in service_loop
    g1.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 473, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1142, in crawl
    self.upd_stime(item[1][1], item[1][0])
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 890, in upd_stime
    self.sendmark(path, stime)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 664, in sendmark
    self.set_slave_xtime(path, mark)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 149, in set_slave_xtime
    self.slave.server.set_xtime_remote(path, self.uuid, mark)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__
    raise res
OSError: [Errno 2] No such file or directory
[2013-12-20 15:36:25.630029] I [syncdutils(/data/master/dp-vol):159:finalize] <top>: exiting.
[2013-12-20 15:36:25.648134] I [monitor(monitor):81:set_state] Monitor: new state: faulty
[2013-12-20 15:36:35.659291] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-12-20 15:36:35.659557] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-12-20 15:36:35.835516] I [gsyncd(/data/master/dp-vol):530:main_i] <top>: syncing: gluster://localhost:dp-vol ->ssh://root.intuit.net:gluster://localhost:dp-vol1
[2013-12-20 15:36:39.495806] I [master(/data/master/dp-vol):58:gmaster_builder] <top>: setting up xsync change detection mode
[2013-12-20 15:36:39.496217] I [master(/data/master/dp-vol):363:__init__] _GMaster: using 'rsync' as the sync engine
[2013-12-20 15:36:39.497653] I [master(/data/master/dp-vol):58:gmaster_builder] <top>: setting up changelog change detection mode
[2013-12-20 15:36:39.497888] I [master(/data/master/dp-vol):363:__init__] _GMaster: using 'rsync' as the sync engine
[2013-12-20 15:36:39.499078] I [master(/data/master/dp-vol):1108:register] _GMaster: xsync temp directory: /var/run/gluster/dp-vol/ssh

Comment 4 M S Vishwanath Bhat 2014-01-02 15:02:23 UTC
Aravinda: Do you have the steps to reproduce this? I tried with 51geo version and I couldn't reproduce it.

Comment 5 Venky Shankar 2014-01-07 12:29:42 UTC
MZ,

This was once observed in Neependra's setup and in Intuit. It was not reproducible after that. It's observed in midst of a normal sync operation.

Comment 6 M S Vishwanath Bhat 2014-01-13 06:33:49 UTC
I was not able to reproduce even after 2-3 tries. The behaviour was same with or without patch. Since I'm hitting the same issue, I will move to verified. Please re-open if seen again.

Tested in Version: glusterfs-3.4.0.53rhs-1.el6rhs.x86_64.rpm

Comment 7 Pavithra 2014-01-15 05:31:55 UTC
Aravinda,

Can you please verify if the edited doc text is technically correct?

Comment 8 Aravinda VK 2014-01-15 05:52:02 UTC
doc text looks good to me.

Comment 10 errata-xmlrpc 2014-02-25 08:13:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html