Bug 1500841

Summary: [geo-rep]: Worker crashes with OSError: [Errno 61] No data available
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.12CC: amukherj, avishwan, bugs, csaba, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-glusterfs-3.12.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1499391 Environment:
Last Closed: 2017-10-13 12:47:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1499391    
Bug Blocks:    

Description Kotresh HR 2017-10-11 15:07:27 UTC
+++ This bug was initially created as a clone of Bug #1499391 +++

+++ This bug was initially created as a clone of Bug #1375094 +++

Description of problem:
=======================

While running the automation snaity check which does "create, chmod, chown, chgrp, symlink, hardlink, rename, truncate, rm" during changelog, xsync and history crawl. 

Following worker crash was observed:

[2016-09-11 13:52:43.422640] E [syncdutils(/bricks/brick1/master_brick5):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1267, in Xsyncer
    self.Xcrawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1406, in Xcrawl
    gfid = self.master.server.gfid(e)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1414, in gfid
    return super(brickserver, cls).gfid(e)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 327, in ff
    return f(*a)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 369, in gfid
    buf = Xattr.lgetxattr(path, cls.GFID_XATTR, 16)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 55, in lgetxattr
    return cls._query_xattr(path, siz, 'lgetxattr', attr)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in _query_xattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 61] No data available
[2016-09-11 13:52:43.428107] I [syncdutils(/bricks/brick1/master_brick5):220:finalize] <top>: exiting.
  


Version-Release number of selected component (if applicable):
=============================================================

mainline


How reproducible:
=================

Happened to see it once, while the same test suite is executed multiple times.

Steps:
Cant be very certain. But it is inbetween the following:
1. Perform rm -rf on master. Let it complete on master
2. Check for files between master and slave
3. File matches on Master and slave and arequal matches
4. Set the change_detector to xsync.
It is between step 2 and 4
This was caught via automation health check which does the fops in changelog,xsync and history one after another. 

Slave Log at the same time:

[2016-09-11 13:52:43.433715] I [fuse-bridge.c:5007:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-MvkqZP
[2016-09-11 13:52:43.436595] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fa62ba77dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fa62d0ef915] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fa62d0ef78b] ) 0-: received signum (15), shutting down
[2016-09-11 13:52:43.436617] I [fuse-bridge.c:5714:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-MvkqZP'.

--- Additional comment from Worker Ant on 2017-10-06 23:17:56 EDT ---

REVIEW: https://review.gluster.org/18445 (geo-rep: Add ENODATA to retry list on gfid getxattr) posted (#1) for review on master by Kotresh HR (khiremat)

--- Additional comment from Worker Ant on 2017-10-10 01:53:45 EDT ---

REVIEW: https://review.gluster.org/18445 (geo-rep: Add ENODATA to retry list on gfid getxattr) posted (#2) for review on master by Kotresh HR (khiremat)

--- Additional comment from Worker Ant on 2017-10-11 06:16:13 EDT ---

COMMIT: https://review.gluster.org/18445 committed in master by Aravinda VK (avishwan) 
------
commit b56bdb34dafd1a87c5bbb2c9a75d1a088d82b1f4
Author: Kotresh HR <khiremat>
Date:   Fri Oct 6 22:42:43 2017 -0400

    geo-rep: Add ENODATA to retry list on gfid getxattr
    
    During xsync crawl, worker occasionally crashed
    with ENODATA on getting gfid from backend. This
    is not persistent and is transient. Worker restart
    invovles re-processing of few entries in changenlogs.
    So adding ENODATA to retry list to avoid worker
    restart.
    
    Change-Id: Ib78d1e925c0a83c78746f28f7c79792a327dfd3e
    BUG: 1499391
    Signed-off-by: Kotresh HR <khiremat>

Comment 1 Worker Ant 2017-10-11 15:10:38 UTC
REVIEW: https://review.gluster.org/18492 (geo-rep: Add ENODATA to retry list on gfid getxattr) posted (#1) for review on release-3.12 by Kotresh HR (khiremat)

Comment 2 Worker Ant 2017-10-12 18:38:10 UTC
COMMIT: https://review.gluster.org/18492 committed in release-3.12 by jiffin tony Thottan (jthottan) 
------
commit e59c078f5ad8b92966033f9c008193938ba6f3ca
Author: Kotresh HR <khiremat>
Date:   Fri Oct 6 22:42:43 2017 -0400

    geo-rep: Add ENODATA to retry list on gfid getxattr
    
    During xsync crawl, worker occasionally crashed
    with ENODATA on getting gfid from backend. This
    is not persistent and is transient. Worker restart
    invovles re-processing of few entries in changenlogs.
    So adding ENODATA to retry list to avoid worker
    restart.
    
    > Change-Id: Ib78d1e925c0a83c78746f28f7c79792a327dfd3e
    > BUG: 1499391
    > Signed-off-by: Kotresh HR <khiremat>
    (cherry picked from commit b56bdb34dafd1a87c5bbb2c9a75d1a088d82b1f4)
    
    Change-Id: Ib78d1e925c0a83c78746f28f7c79792a327dfd3e
    BUG: 1500841
    Signed-off-by: Kotresh HR <khiremat>

Comment 3 Jiffin 2017-10-13 12:47:53 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.12.2, please open a new bug report.

glusterfs-glusterfs-3.12.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-October/032684.html
[2] https://www.gluster.org/pipermail/gluster-users/