+++ This bug was initially created as a clone of Bug #1261300 +++ Description of problem: ======================= Observing "OSError: [Errno 107] Transport endpoint is not connected" and "OSError: [Errno 117] Structure needs cleaning:" traceback on the slave logs when issued "rm -rf" from master [2015-09-08 19:01:09.604484] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops [], [ENOTEMPTY, ESTALE, ENODATA]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in recursive_rmdir recursive_rmdir(gfid, entry, fullname) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 667, in recursive_rmdir errno_wrap(os.rmdir, [path], [ENOENT, ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) OSError: [Errno 107] Transport endpoint is not connected: '.gfid/4752e6d2-4bef-4319-a535-2fbf1fea55e7/level04/hardlink_to_files' [2015-09-08 19:01:09.621318] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. [2015-09-08 19:01:09.621871] I [syncdutils(slave):220:finalize] <top>: exiting. [2015-09-08 19:01:20.188294] I [gsyncd(slave):649:main_i] <top>: syncing: gluster://localhost:slave [2015-09-08 19:01:21.213108] I [gsyncd(slave):649:main_i] <top>: syncing: gluster://localhost:slave [2015-09-08 19:01:21.355303] I [resource(slave):844:service_loop] GLUSTER: slave listening [2015-09-08 19:01:22.369762] I [resource(slave):844:service_loop] GLUSTER: slave listening [2015-09-08 19:01:36.427643] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['b7e28dc5-f77a-45ff-b5ed-38fc596359a7', '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44', '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44'])...[Errno 39] Directory not empty: '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44' [2015-09-08 19:01:36.428356] W [resource(slave):692:entry_ops] <top>: Recursive remove b7e28dc5-f77a-45ff-b5ed-38fc596359a7 => .gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44failed: Directory not empty [2015-09-08 19:01:37.557725] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops [], [ENOTEMPTY, ESTALE, ENODATA]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 659, in recursive_rmdir EISDIR]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) OSError: [Errno 117] Structure needs cleaning: '.gfid/b5a0188e-f6fa-4c67-81dd-66b4f68a330b/level14/symlink_to_files' [2015-09-08 19:01:37.568291] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. Eventually with retrial all the files are removed from the slave. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.1-14.el7rhgs.x86_64 How reproducible: ================= Always Steps to Reproduce: =================== 1. Create master and slave cluster 2. Create geo-rep session between master and slave volume 3. Execute following fops on master and verify the sync on slave create, chmod, chown, chgrp, symlink, hardlink, truncate, rename, followed by remove {rm -rf } --- Additional comment from Red Hat Bugzilla Rules Engine on 2015-09-09 03:25:53 EDT --- This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Rahul Hinduja on 2015-09-09 05:33:43 EDT --- sosreports are at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1261300/ --- Additional comment from Sakshi on 2015-09-14 00:36:17 EDT --- Reason for Structure needs cleaning error is that the lookup selfheal in DHT is not setting errors properly at few places. Fix sent here http://review.gluster.org/#/c/12165/ --- Additional comment from Rahul Hinduja on 2016-01-04 06:29:24 EST --- For records, hitting this consistently with rm -rf * on build: glusterfs-3.7.5-13.el7rhgs.x86_64 [2016-01-04 11:06:34.928106] I [resource(slave):844:service_loop] GLUSTER: slave listening [2016-01-04 11:06:47.999924] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops [], [ENOTEMPTY, ESTALE, ENODATA]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in recursive_rmdir recursive_rmdir(gfid, entry, fullname) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 667, in recursive_rmdir errno_wrap(os.rmdir, [path], [ENOENT, ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) OSError: [Errno 107] Transport endpoint is not connected: '.gfid/1d9fc177-fc1e-411f-8ea9-b6e1234575af/level11/symlink_to_files' --- Additional comment from Rahul Hinduja on 2016-04-14 06:51:30 EDT --- For records, hitting this consistently with rm -rf * on build: glusterfs-3.7.9-1.el7rhgs.x86_64 [2016-04-14 10:34:21.478831] W [resource(slave):721:entry_ops] <top>: Recursive remove ec3be6a9-d725-4a09-8a4c-ef98a84ee39f => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty [2016-04-14 10:34:26.432831] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['ec3be6a9-d725-4a09-8a4c-ef98a84ee39f', '.gfid/00000000-0000-0000-0000-000000000001/thread1', '.gfid/00000000-0000-0000-0000-000000000001/thread1'])...[Errno 39] Directory not empty: '.gfid/00000000-0000-0000-0000-000000000001/thread1/level07/level17/level27/level37' [2016-04-14 10:34:26.433208] W [resource(slave):721:entry_ops] <top>: Recursive remove ec3be6a9-d725-4a09-8a4c-ef98a84ee39f => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty [2016-04-14 10:35:03.264699] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops [], [ENOTEMPTY, ESTALE, ENODATA]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 666, in recursive_rmdir recursive_rmdir(gfid, entry, fullname) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir errno_wrap(os.rmdir, [path], [ENOENT, ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) OSError: [Errno 107] Transport endpoint is not connected: '.gfid/cd8f7393-5078-4776-a24f-3e8eb69e953f/level37/symlink_to_files' [2016-04-14 10:35:03.274899] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. [2016-04-14 10:35:03.275094] I [syncdutils(slave):220:finalize] <top>: exiting. [2016-04-14 10:35:08.326251] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops [], [ENOTEMPTY, ESTALE, ENODATA]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir errno_wrap(os.rmdir, [path], [ENOENT, ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) OSError: [Errno 107] Transport endpoint is not connected: '.gfid/af8c23fb-dfd2-4c76-8c74-45813518acb0/symlink_to_files' [2016-04-14 10:35:08.335688] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. [2016-04-14 10:35:08.335922] I [syncdutils(slave):220:finalize] <top>: exiting. [2016-04-14 10:35:14.135530] I [gsyncd(slave):653:main_i] <top>: syncing: gluster://localhost:slave [2016-04-14 10:35:15.292807] I [resource(slave):902:service_loop] GLUSTER: slave listening [2016-04-14 10:35:19.146552] I [gsyncd(slave):653:main_i] <top>: syncing: gluster://localhost:slave [2016-04-14 10:35:20.289783] I [resource(slave):902:service_loop] GLUSTER: slave listening [2016-04-14 10:36:16.786941] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops [], [ENOTEMPTY, ESTALE, ENODATA]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir errno_wrap(os.rmdir, [path], [ENOENT, ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) OSError: [Errno 107] Transport endpoint is not connected: '.gfid/8ca37ba9-5d4f-40a1-94df-aed09a2bd36a/symlink_to_files'
REVIEW: http://review.gluster.org/14307 (geo-rep: Add Retry in case of ENOTCONN during rmdir) posted (#1) for review on master by Aravinda VK (avishwan)
REVIEW: http://review.gluster.org/14307 (geo-rep: Add Retry in case of ENOTCONN during rmdir) posted (#2) for review on master by Aravinda VK (avishwan)
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.