Bug 1223286 - [geo-rep]: worker died with "ESTALE" when performed rm -rf on a directory from mount of master volume
Summary: [geo-rep]: worker died with "ESTALE" when performed rm -rf on a directory fro...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.7.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Aravinda VK
QA Contact:
URL:
Whiteboard:
Depends On: 1223280
Blocks: glusterfs-3.7.1 1222856 1232912 1236093
TreeView+ depends on / blocked
 
Reported: 2015-05-20 09:14 UTC by Aravinda VK
Modified: 2015-06-26 14:07 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.7.1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1223280
Environment:
Last Closed: 2015-06-02 06:21:00 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Aravinda VK 2015-05-20 09:14:08 UTC
+++ This bug was initially created as a clone of Bug #1223280 +++

+++ This bug was initially created as a clone of Bug #1222856 +++

Description of problem:
=======================

Whenever perfomred rm -rf on the master volume, the worker died with the backtrace as:


[2015-05-19 15:33:13.868683] E [syncdutils(/rhs/brick2/b2):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1440, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 580, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1150, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1059, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 946, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 902, in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 116] Stale file handle
[2015-05-19 15:33:13.870326] I [syncdutils(/rhs/brick2/b2):220:finalize] <top>: exiting.
[2015-05-19 15:33:13.874784] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.

And with everytime monitor tries to spawn the process, it dies in startup phase.


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.0-2.el6rhs.x86_64


How reproducible:
================

Tried couple of times and was successful in reproducing it in as many times


Steps Carried:
==============
1. Created master cluster 
2. Created and started master volume
3. Created shared volume (gluster_shared_storage)
4. Mounted the shared volume on /var/run/gluster/shared_storage
5. Created Slave cluster
6. Created and Started slave volume
7. Created geo-rep session between master and slave
8. Configured use_meta_volume true
9. Started geo-rep
10. Mounted master volume over Fuse and NFS to client
11. Copied files /etc{1..10} from fuse mount
12. Copied files /etc{11.20} from NFS mount
13. Sync completed successfully
14. Removed the files etc.2 from fuse and etc.12 from NFS
15. Looked into the geo-rep session it was faulty 
16. Looked into the logs, it showed continuous traceback

Actual results:
===============

It crashed and comes back with crawl type as history


Expected results:
=================

Worker should not crash and it should handle ESTALE gracefully

--- Additional comment from Anand Avati on 2015-05-20 05:10:44 EDT ---

REVIEW: http://review.gluster.org/10837 (geo-rep: Ignore ESTALE during unlink/rmdir) posted (#1) for review on master by Aravinda VK (avishwan@redhat.com)

Comment 1 Anand Avati 2015-05-26 09:00:55 UTC
REVIEW: http://review.gluster.org/10913 (geo-rep: Ignore ESTALE during unlink/rmdir) posted (#1) for review on release-3.7 by Aravinda VK (avishwan@redhat.com)

Comment 2 Anand Avati 2015-05-29 07:37:32 UTC
REVIEW: http://review.gluster.org/10913 (geo-rep: Ignore ESTALE during unlink/rmdir) posted (#2) for review on release-3.7 by Aravinda VK (avishwan@redhat.com)

Comment 3 Anand Avati 2015-05-31 13:35:32 UTC
REVIEW: http://review.gluster.org/10913 (geo-rep: Ignore ESTALE during unlink/rmdir) posted (#3) for review on release-3.7 by Aravinda VK (avishwan@redhat.com)

Comment 4 Anand Avati 2015-06-01 03:53:16 UTC
REVIEW: http://review.gluster.org/10913 (geo-rep: Ignore ESTALE during unlink/rmdir) posted (#5) for review on release-3.7 by Aravinda VK (avishwan@redhat.com)


Note You need to log in before you can comment on or make changes to this bug.