+++ This bug was initially created as a clone of Bug #1222856 +++ Description of problem: ======================= Whenever perfomred rm -rf on the master volume, the worker died with the backtrace as: [2015-05-19 15:33:13.868683] E [syncdutils(/rhs/brick2/b2):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1440, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 580, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1150, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1059, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 946, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 902, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 116] Stale file handle [2015-05-19 15:33:13.870326] I [syncdutils(/rhs/brick2/b2):220:finalize] <top>: exiting. [2015-05-19 15:33:13.874784] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. And with everytime monitor tries to spawn the process, it dies in startup phase. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.0-2.el6rhs.x86_64 How reproducible: ================ Tried couple of times and was successful in reproducing it in as many times Steps Carried: ============== 1. Created master cluster 2. Created and started master volume 3. Created shared volume (gluster_shared_storage) 4. Mounted the shared volume on /var/run/gluster/shared_storage 5. Created Slave cluster 6. Created and Started slave volume 7. Created geo-rep session between master and slave 8. Configured use_meta_volume true 9. Started geo-rep 10. Mounted master volume over Fuse and NFS to client 11. Copied files /etc{1..10} from fuse mount 12. Copied files /etc{11.20} from NFS mount 13. Sync completed successfully 14. Removed the files etc.2 from fuse and etc.12 from NFS 15. Looked into the geo-rep session it was faulty 16. Looked into the logs, it showed continuous traceback Actual results: =============== It crashed and comes back with crawl type as history Expected results: ================= Worker should not crash and it should handle ESTALE gracefully
REVIEW: http://review.gluster.org/10837 (geo-rep: Ignore ESTALE during unlink/rmdir) posted (#1) for review on master by Aravinda VK (avishwan)
REVIEW: http://review.gluster.org/10837 (geo-rep: Ignore ESTALE during unlink/rmdir) posted (#2) for review on master by Aravinda VK (avishwan)
COMMIT: http://review.gluster.org/10837 committed in master by Venky Shankar (vshankar) ------ commit f999a8634850db0627c768b12dba0aa84b4ff7b7 Author: Aravinda VK <avishwan> Date: Wed May 20 14:34:11 2015 +0530 geo-rep: Ignore ESTALE during unlink/rmdir during unlink/rmdir of Parent_GFID/Basename, if parent directory does not exists. Parent GFID will not get resolved and DHT raises ESTALE instead of ENOENT. Now ESTALE errors ignored during unlink/rmdir BUG: 1223280 Change-Id: If275c89fb9fc7d16004550805a4cd65be818540d Signed-off-by: Aravinda VK <avishwan> Reviewed-on: http://review.gluster.org/10837 Reviewed-by: Kotresh HR <khiremat> Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Venky Shankar <vshankar>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user