Bug 1069191 - geo-rep: gsyncd worker process crash
Summary: geo-rep: gsyncd worker process crash
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1129392 1147448
TreeView+ depends on / blocked
 
Reported: 2014-02-24 12:49 UTC by Kotresh HR
Modified: 2014-09-29 09:32 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-04-17 11:52:59 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kotresh HR 2014-02-24 12:49:22 UTC
Description of problem:
gsyncd worker thread crash with following trace:

[2014-02-24 18:10:05.907075] E [syncdutils(/bricks/brick1/b1):240:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 542, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/local/libexec/glusterfs/python/syncdaemon/resource.py", line 1177, in service_loop
    g2.crawlwrap()
  File "/usr/local/libexec/glusterfs/python/syncdaemon/master.py", line 467, in crawlwrap
    self.crawl()
  File "/usr/local/libexec/glusterfs/python/syncdaemon/master.py", line 1067, in crawl
    self.process(changes)
  File "/usr/local/libexec/glusterfs/python/syncdaemon/master.py", line 825, in process
    self.process_change(change, done, retry)
  File "/usr/local/libexec/glusterfs/python/syncdaemon/master.py", line 758, in process_change
    st = lstat(go)
  File "/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 432, in lstat
    return os.lstat(e)
OSError: [Errno 116] Stale file handle: '.gfid/74837191-2b70-4348-a72c-2199efaa8928'



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Setup geo-rep session between master and slave
2. On master:
    A) touch file
    B) link file link
    C) rm file link
              

Actual results:
Gsyncd worker process crashed and the changelog file corresponding
to above files stuck in .processing forever

Expected results:
Gsyncd worker process should not crash and changelogs should 
be processed.

Additional info:

Comment 1 Anand Avati 2014-02-25 10:55:21 UTC
REVIEW: http://review.gluster.org/7154 (geo-rep/gfid-access: Fix errno for non-existent GFID.) posted (#1) for review on master by Kotresh HR (khiremat)

Comment 2 Anand Avati 2014-02-28 05:45:37 UTC
COMMIT: http://review.gluster.org/7154 committed in master by Anand Avati (avati) 
------
commit 6535bafe588ea901ac15d31ddb6550a2ba9cd915
Author: Kotresh H R <khiremat>
Date:   Tue Feb 25 16:20:46 2014 +0530

    geo-rep/gfid-access: Fix errno for non-existent GFID.
    
    Because of http://review.gluster.org/#/c/6318/ patch,
    ESTALE is returned for a lookukp on non-existent GFID.
    But ENOENT is more appropriate when lookup happens
    through virtual .gfid directory on aux-gfid-mount
    point. This is avoids confusion for the consumers
    of gfid-access-translator like geo-rep which expects
    ENOENT.
    
    Change-Id: I4add2edf5958bb59ce55d02726e6b3e801b101bb
    BUG: 1069191
    Signed-off-by: Kotresh H R <khiremat>
    Reviewed-on: http://review.gluster.org/7154
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Venky Shankar <vshankar>
    Reviewed-by: Anand Avati <avati>

Comment 3 Anand Avati 2014-02-28 07:36:40 UTC
REVIEW: http://review.gluster.org/7163 (geo-rep/gfid-access: Fix errno for non-existent GFID.) posted (#1) for review on release-3.5 by Kotresh HR (khiremat)

Comment 4 Anand Avati 2014-03-03 05:05:38 UTC
COMMIT: http://review.gluster.org/7163 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 753185d56492b4f9044df186ce664f206388ef46
Author: Kotresh H R <khiremat>
Date:   Tue Feb 25 16:20:46 2014 +0530

    geo-rep/gfid-access: Fix errno for non-existent GFID.
    
    Because of http://review.gluster.org/#/c/6318/ patch,
    ESTALE is returned for a lookukp on non-existent GFID.
    But ENOENT is more appropriate when lookup happens
    through virtual .gfid directory on aux-gfid-mount
    point. This is avoids confusion for the consumers
    of gfid-access-translator like geo-rep which expects
    ENOENT.
    
    Change-Id: I4add2edf5958bb59ce55d02726e6b3e801b101bb
    BUG: 1069191
    Signed-off-by: Kotresh H R <khiremat>
    Reviewed-on: http://review.gluster.org/7154
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Venky Shankar <vshankar>
    Reviewed-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/7163
    Reviewed-by: Vijay Bellur <vbellur>

Comment 5 Niels de Vos 2014-04-17 11:52:59 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.