Bug 1499391 - [geo-rep]: Worker crashes with OSError: [Errno 61] No data available
Summary: [geo-rep]: Worker crashes with OSError: [Errno 61] No data available
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1500841
TreeView+ depends on / blocked
 
Reported: 2017-10-07 03:15 UTC by Kotresh HR
Modified: 2017-12-08 17:42 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.13.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1375094
: 1500841 (view as bug list)
Environment:
Last Closed: 2017-12-08 17:42:08 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kotresh HR 2017-10-07 03:15:31 UTC
+++ This bug was initially created as a clone of Bug #1375094 +++

Description of problem:
=======================

While running the automation snaity check which does "create, chmod, chown, chgrp, symlink, hardlink, rename, truncate, rm" during changelog, xsync and history crawl. 

Following worker crash was observed:

[2016-09-11 13:52:43.422640] E [syncdutils(/bricks/brick1/master_brick5):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1267, in Xsyncer
    self.Xcrawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1424, in Xcrawl
    self.Xcrawl(e, xtr_root)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1406, in Xcrawl
    gfid = self.master.server.gfid(e)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1414, in gfid
    return super(brickserver, cls).gfid(e)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 327, in ff
    return f(*a)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 369, in gfid
    buf = Xattr.lgetxattr(path, cls.GFID_XATTR, 16)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 55, in lgetxattr
    return cls._query_xattr(path, siz, 'lgetxattr', attr)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in _query_xattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 61] No data available
[2016-09-11 13:52:43.428107] I [syncdutils(/bricks/brick1/master_brick5):220:finalize] <top>: exiting.
  


Version-Release number of selected component (if applicable):
=============================================================

mainline


How reproducible:
=================

Happened to see it once, while the same test suite is executed multiple times.

Steps:
Cant be very certain. But it is inbetween the following:
1. Perform rm -rf on master. Let it complete on master
2. Check for files between master and slave
3. File matches on Master and slave and arequal matches
4. Set the change_detector to xsync.
It is between step 2 and 4
This was caught via automation health check which does the fops in changelog,xsync and history one after another. 

Slave Log at the same time:

[2016-09-11 13:52:43.433715] I [fuse-bridge.c:5007:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-MvkqZP
[2016-09-11 13:52:43.436595] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fa62ba77dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fa62d0ef915] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fa62d0ef78b] ) 0-: received signum (15), shutting down
[2016-09-11 13:52:43.436617] I [fuse-bridge.c:5714:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-MvkqZP'.

Comment 1 Worker Ant 2017-10-07 03:17:56 UTC
REVIEW: https://review.gluster.org/18445 (geo-rep: Add ENODATA to retry list on gfid getxattr) posted (#1) for review on master by Kotresh HR (khiremat)

Comment 2 Worker Ant 2017-10-10 05:53:45 UTC
REVIEW: https://review.gluster.org/18445 (geo-rep: Add ENODATA to retry list on gfid getxattr) posted (#2) for review on master by Kotresh HR (khiremat)

Comment 3 Worker Ant 2017-10-11 10:16:13 UTC
COMMIT: https://review.gluster.org/18445 committed in master by Aravinda VK (avishwan) 
------
commit b56bdb34dafd1a87c5bbb2c9a75d1a088d82b1f4
Author: Kotresh HR <khiremat>
Date:   Fri Oct 6 22:42:43 2017 -0400

    geo-rep: Add ENODATA to retry list on gfid getxattr
    
    During xsync crawl, worker occasionally crashed
    with ENODATA on getting gfid from backend. This
    is not persistent and is transient. Worker restart
    invovles re-processing of few entries in changenlogs.
    So adding ENODATA to retry list to avoid worker
    restart.
    
    Change-Id: Ib78d1e925c0a83c78746f28f7c79792a327dfd3e
    BUG: 1499391
    Signed-off-by: Kotresh HR <khiremat>

Comment 4 Shyamsundar 2017-12-08 17:42:08 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report.

glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.