+++ This bug was initially created as a clone of Bug #1247529 +++ +++ This bug was initially created as a clone of Bug #1239075 +++ Description of problem: ======================= Ran the tests which does the following FOP's inorder: Create, chmod, chown, chgrp, symlink, hardlink, truncate, rename, remove. The above fops are successful and they are successfully synced to slave. But the logs on Master and Slave are as follows: Master: ======= [2015-07-03 13:36:43.154763] E [syncdutils(/bricks/brick0/master_brick0):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1438, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 580, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1161, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1070, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 948, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 903, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 116] Stale file handle: '.gfid/fece7967-616b-4d13-add7-96f6a4022e11/55958721%%BO54CXD7RN' [2015-07-03 13:36:43.156742] I [syncdutils(/bricks/brick0/master_brick0):220:finalize] <top>: exiting. [2015-07-03 13:36:43.159702] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. Slave: ====== [2015-07-03 13:36:38.359909] I [resource(slave):844:service_loop] GLUSTER: slave listening [2015-07-03 13:36:43.149735] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 717, in entry_ops st = lstat(entry) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 493, in lstat return os.lstat(e) OSError: [Errno 116] Stale file handle: '.gfid/fece7967-616b-4d13-add7-96f6a4022e11/55958721%%BO54CXD7RN' [2015-07-03 13:36:43.158221] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. [2015-07-03 13:36:43.158576] I [syncdutils(slave):220:finalize] <top>: exiting. Version-Release number of selected component (if applicable): ============================================================= How reproducible: ================= 2/2 Steps to Reproduce: =================== 1. Create geo-rep session between Master (3x2) and Slave (3x2) 2. Run the following fops in sequential order and check the arequal after each fop: 2015-07-03 13:03:31,870 INFO run Executing crefi --multi -n 5 -b 5 -d 5 --max=10k --min=5k --random -T 5 -t text --fop=create /mnt/glusterfs 1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com 2015-07-03 13:05:53,581 INFO run Executing crefi --multi -n 5 -b 5 -d 5 --max=10k --min=5k --random -T 5 -t text --fop=chmod /mnt/glusterfs 1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com 2015-07-03 13:08:17,690 INFO run Executing crefi --multi -n 5 -b 5 -d 5 --max=10k --min=5k --random -T 5 -t text --fop=chown /mnt/glusterfs 1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com 2015-07-03 13:10:41,876 INFO run Executing crefi --multi -n 5 -b 5 -d 5 --max=10k --min=5k --random -T 5 -t text --fop=chgrp /mnt/glusterfs 1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com 2015-07-03 13:13:06,050 INFO run Executing crefi --multi -n 5 -b 5 -d 5 --max=10k --min=5k --random -T 5 -t text --fop=symlink /mnt/glusterfs 1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com 2015-07-03 13:15:37,194 INFO run Executing crefi --multi -n 5 -b 5 -d 5 --max=10k --min=5k --random -T 5 -t text --fop=hardlink /mnt/glusterfs 1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com 2015-07-03 13:18:16,751 INFO run Executing crefi --multi -n 5 -b 5 -d 5 --max=10k --min=5k --random -T 5 -t text --fop=truncate /mnt/glusterfs 1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com 2015-07-03 13:21:06,530 INFO run Executing crefi --multi -n 5 -b 5 -d 5 --max=10k --min=5k --random -T 5 -t text --fop=rename /mnt/glusterfs 1>/dev/null 2>&1 on wingo.lab.eng.blr.redhat.com
COMMIT: http://review.gluster.org/11820 committed in release-3.7 by Venky Shankar (vshankar) ------ commit 2fd4ae628c7940ef5b505666afbf5eae0cc655b9 Author: Kotresh HR <khiremat> Date: Tue Jul 28 14:37:47 2015 +0530 geo-rep: Do not crash worker on ESTALE Handle ESTALE returned by lstat gracefully by retrying it. Do not crash the worker. BUG: 1249547 Change-Id: I57fb9933900153ab41c3d9b73748b1cdaa8d89ca Reviewed-on: http://review.gluster.org/11772 Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Aravinda VK <avishwan> Reviewed-by: Venky Shankar <vshankar> Signed-off-by: Kotresh HR <khiremat> Reviewed-on: http://review.gluster.org/11820 Reviewed-by: Milind Changire <mchangir>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report. glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user