Bug 1073844 - geo-replication fails with OSError when setting remote xtime
Summary: geo-replication fails with OSError when setting remote xtime
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On: 1046604
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-07 10:34 UTC by Kotresh HR
Modified: 2014-11-07 10:47 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 1046604
Environment:
Last Closed: 2014-04-17 12:29:33 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kotresh HR 2014-03-07 10:34:15 UTC
+++ This bug was initially created as a clone of Bug #1046604 +++

Description of problem:
I installed release 51 and started the geo-rep process and I am seeing:

[2013-12-20 15:22:52.211688] W [master(/data/master/dp-vol):253:regjob] _GMaster: Rsync: .gfid/90c5b4fc-b7a3-4d73-a2ac-1421878a8e3f [errcode: 23]
[2013-12-20 15:22:52.212217] W [master(/data/master/dp-vol):883:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1387578655
[2013-12-20 15:29:34.180546] I [master(/data/master/dp-vol):1138:crawl] _GMaster: processing xsync changelog /var/run/gluster/dp-vol/ssh%3A%2F%2Froot%4010.145.74.241%3Agluster%3A%2F%2F127.0.0.1%3Adp-vol1/c0e8e929978b0e1a0fa2511da0bdc805/xsync/XSYNC-CHANGELOG.1387578659
[2013-12-20 15:36:25.599109] E [repce(/data/master/dp-vol):188:__call__] RepceClient: call 8647:140093031302912:1387582585.55 (set_xtime_remote) failed on peer with OSError
[2013-12-20 15:36:25.599637] E [syncdutils(/data/master/dp-vol):207:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 540, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1156, in service_loop
    g1.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 473, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1142, in crawl
    self.upd_stime(item[1][1], item[1][0])
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 890, in upd_stime
    self.sendmark(path, stime)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 664, in sendmark
    self.set_slave_xtime(path, mark)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 149, in set_slave_xtime
    self.slave.server.set_xtime_remote(path, self.uuid, mark)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__
    raise res
OSError: [Errno 2] No such file or directory
[2013-12-20 15:36:25.630029] I [syncdutils(/data/master/dp-vol):159:finalize] <top>: exiting.
[2013-12-20 15:36:25.648134] I [monitor(monitor):81:set_state] Monitor: new state: faulty
[2013-12-20 15:36:35.659291] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-12-20 15:36:35.659557] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-12-20 15:36:35.835516] I [gsyncd(/data/master/dp-vol):530:main_i] <top>: syncing: gluster://localhost:dp-vol ->ssh://root.intuit.net:gluster://localhost:dp-vol1
[2013-12-20 15:36:39.495806] I [master(/data/master/dp-vol):58:gmaster_builder] <top>: setting up xsync change detection mode
[2013-12-20 15:36:39.496217] I [master(/data/master/dp-vol):363:__init__] _GMaster: using 'rsync' as the sync engine
[2013-12-20 15:36:39.497653] I [master(/data/master/dp-vol):58:gmaster_builder] <top>: setting up changelog change detection mode
[2013-12-20 15:36:39.497888] I [master(/data/master/dp-vol):363:__init__] _GMaster: using 'rsync' as the sync engine
[2013-12-20 15:36:39.499078] I [master(/data/master/dp-vol):1108:register] _GMaster: xsync temp directory: /var/run/gluster/dp-vol/ssh

--- Additional comment from RHEL Product and Program Management on 2013-12-26 04:54:37 EST ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Nagaprasad Sathyanarayana on 2013-12-26 05:28:30 EST ---

https://code.engineering.redhat.com/gerrit/#/c/17885/ -> U1
https://code.engineering.redhat.com/gerrit/#/c/17887/ -> U2

--- Additional comment from Venky Shankar on 2013-12-26 05:51:28 EST ---

This is a quick fix for the customer as of now. With the introduction of per-directory sync times, the failover/failback mechanism would need to change and therefore this patch would just comment out the remote xtime set so as to get away with the error.

Remote xtime (on root) would anyway be of no use with per-directory sync times.

--- Additional comment from M S Vishwanath Bhat on 2014-01-02 10:02:23 EST ---

Aravinda: Do you have the steps to reproduce this? I tried with 51geo version and I couldn't reproduce it.

--- Additional comment from Venky Shankar on 2014-01-07 07:29:42 EST ---

MZ,

This was once observed in Neependra's setup and in Intuit. It was not reproducible after that. It's observed in midst of a normal sync operation.

--- Additional comment from M S Vishwanath Bhat on 2014-01-13 01:33:49 EST ---

I was not able to reproduce even after 2-3 tries. The behaviour was same with or without patch. Since I'm hitting the same issue, I will move to verified. Please re-open if seen again.

Tested in Version: glusterfs-3.4.0.53rhs-1.el6rhs.x86_64.rpm

--- Additional comment from Pavithra on 2014-01-15 00:31:55 EST ---

Aravinda,

Can you please verify if the edited doc text is technically correct?

--- Additional comment from Aravinda VK on 2014-01-15 00:52:02 EST ---

doc text looks good to me.

--- Additional comment from errata-xmlrpc on 2014-02-25 00:43:27 EST ---

Bug report changed to RELEASE_PENDING status by Errata System.
Advisory RHEA-2013:15734-03 has been changed to PUSH_READY status.
https://errata.devel.redhat.com/advisory/15734

--- Additional comment from errata-xmlrpc on 2014-02-25 03:13:01 EST ---

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html

Comment 2 Anand Avati 2014-03-07 11:07:18 UTC
REVIEW: http://review.gluster.org/7206 (geo-rep: quick-fix for remote xtime set failed) posted (#1) for review on master by Kotresh HR (khiremat)

Comment 3 Anand Avati 2014-03-07 11:11:43 UTC
REVIEW: http://review.gluster.org/7207 (geo-rep: quick-fix for remote xtime set failed) posted (#1) for review on release-3.5 by Kotresh HR (khiremat)

Comment 4 Anand Avati 2014-03-08 04:14:15 UTC
COMMIT: http://review.gluster.org/7206 committed in master by Vijay Bellur (vbellur) 
------
commit 82f20483f753f2da6c1449d739fafa506a424eda
Author: Kotresh H R <khiremat>
Date:   Fri Mar 7 16:35:01 2014 +0530

    geo-rep: quick-fix for remote xtime set failed
    
    Remote xtime is required for failover/failback,
    this patch is quick fix to avoid the OSError.
    
    Code is masked out, this need to be resolved when
    failover/failback is worked on.
    
    Change-Id: If339d88a2ccd8ef18a3b3c015df765c93dcb020c
    BUG: 1073844
    Signed-off-by: Kotresh H R <khiremat>
    Reviewed-on: http://review.gluster.org/7206
    Reviewed-by: Aravinda VK <avishwan>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 5 Anand Avati 2014-03-08 04:14:47 UTC
COMMIT: http://review.gluster.org/7207 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit e779cc8c32692112d751571e65a1fb898f500d5b
Author: Kotresh H R <khiremat>
Date:   Fri Mar 7 16:35:01 2014 +0530

    geo-rep: quick-fix for remote xtime set failed
    
    Remote xtime is required for failover/failback,
    this patch is quick fix to avoid the OSError.
    
    Code is masked out, this need to be resolved when
    failover/failback is worked on.
    
    Change-Id: If339d88a2ccd8ef18a3b3c015df765c93dcb020c
    BUG: 1073844
    Signed-off-by: Kotresh H R <khiremat>
    Reviewed-on: http://review.gluster.org/7207
    Reviewed-by: Aravinda VK <avishwan>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 6 Niels de Vos 2014-04-17 12:29:33 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.