Bug 1335538 - [geo-rep]: recurrsive_rmdir causing "Transport endpoint not connected" followed by "Structure needs cleaning"
Summary: [geo-rep]: recurrsive_rmdir causing "Transport endpoint not connected" follow...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Aravinda VK
QA Contact:
URL:
Whiteboard:
Depends On: 1261300
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-12 12:56 UTC by Aravinda VK
Modified: 2018-08-29 03:36 UTC (History)
10 users (show)

Fixed In Version: glusterfs-4.1.3 (or later)
Clone Of: 1261300
Environment:
Last Closed: 2018-08-29 03:36:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Aravinda VK 2016-05-12 12:56:58 UTC
+++ This bug was initially created as a clone of Bug #1261300 +++

Description of problem:
=======================

Observing "OSError: [Errno 107] Transport endpoint is not connected" and "OSError: [Errno 117] Structure needs cleaning:" traceback on the slave logs when issued "rm -rf" from master

[2015-09-08 19:01:09.604484] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in recursive_rmdir
    recursive_rmdir(gfid, entry, fullname)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 667, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/4752e6d2-4bef-4319-a535-2fbf1fea55e7/level04/hardlink_to_files'
[2015-09-08 19:01:09.621318] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-09-08 19:01:09.621871] I [syncdutils(slave):220:finalize] <top>: exiting.
[2015-09-08 19:01:20.188294] I [gsyncd(slave):649:main_i] <top>: syncing: gluster://localhost:slave
[2015-09-08 19:01:21.213108] I [gsyncd(slave):649:main_i] <top>: syncing: gluster://localhost:slave
[2015-09-08 19:01:21.355303] I [resource(slave):844:service_loop] GLUSTER: slave listening
[2015-09-08 19:01:22.369762] I [resource(slave):844:service_loop] GLUSTER: slave listening
[2015-09-08 19:01:36.427643] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['b7e28dc5-f77a-45ff-b5ed-38fc596359a7', '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44', '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44'])...[Errno 39] Directory not empty: '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44'
[2015-09-08 19:01:36.428356] W [resource(slave):692:entry_ops] <top>: Recursive remove b7e28dc5-f77a-45ff-b5ed-38fc596359a7 => .gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44failed: Directory not empty
[2015-09-08 19:01:37.557725] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 659, in recursive_rmdir
    EISDIR])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 117] Structure needs cleaning: '.gfid/b5a0188e-f6fa-4c67-81dd-66b4f68a330b/level14/symlink_to_files'
[2015-09-08 19:01:37.568291] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.

Eventually with retrial all the files are removed from the slave. 

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.1-14.el7rhgs.x86_64


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create master and slave cluster
2. Create geo-rep session between master and slave volume
3. Execute following fops on master and verify the sync on slave 
create, chmod, chown, chgrp, symlink, hardlink, truncate, rename, followed by remove {rm -rf }

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-09-09 03:25:53 EDT ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Rahul Hinduja on 2015-09-09 05:33:43 EDT ---

sosreports are at: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1261300/

--- Additional comment from Sakshi on 2015-09-14 00:36:17 EDT ---

Reason for Structure needs cleaning error is that the lookup selfheal in DHT is not setting errors properly at few places. Fix sent here
http://review.gluster.org/#/c/12165/

--- Additional comment from Rahul Hinduja on 2016-01-04 06:29:24 EST ---

For records, hitting this consistently with rm -rf * on build: glusterfs-3.7.5-13.el7rhgs.x86_64

[2016-01-04 11:06:34.928106] I [resource(slave):844:service_loop] GLUSTER: slave listening
[2016-01-04 11:06:47.999924] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in recursive_rmdir
    recursive_rmdir(gfid, entry, fullname)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 667, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/1d9fc177-fc1e-411f-8ea9-b6e1234575af/level11/symlink_to_files'

--- Additional comment from Rahul Hinduja on 2016-04-14 06:51:30 EDT ---

For records, hitting this consistently with rm -rf * on build: glusterfs-3.7.9-1.el7rhgs.x86_64

[2016-04-14 10:34:21.478831] W [resource(slave):721:entry_ops] <top>: Recursive remove ec3be6a9-d725-4a09-8a4c-ef98a84ee39f => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty
[2016-04-14 10:34:26.432831] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['ec3be6a9-d725-4a09-8a4c-ef98a84ee39f', '.gfid/00000000-0000-0000-0000-000000000001/thread1', '.gfid/00000000-0000-0000-0000-000000000001/thread1'])...[Errno 39] Directory not empty: '.gfid/00000000-0000-0000-0000-000000000001/thread1/level07/level17/level27/level37'
[2016-04-14 10:34:26.433208] W [resource(slave):721:entry_ops] <top>: Recursive remove ec3be6a9-d725-4a09-8a4c-ef98a84ee39f => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty
[2016-04-14 10:35:03.264699] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 666, in recursive_rmdir
    recursive_rmdir(gfid, entry, fullname)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/cd8f7393-5078-4776-a24f-3e8eb69e953f/level37/symlink_to_files'
[2016-04-14 10:35:03.274899] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-04-14 10:35:03.275094] I [syncdutils(slave):220:finalize] <top>: exiting.
[2016-04-14 10:35:08.326251] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/af8c23fb-dfd2-4c76-8c74-45813518acb0/symlink_to_files'
[2016-04-14 10:35:08.335688] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-04-14 10:35:08.335922] I [syncdutils(slave):220:finalize] <top>: exiting.
[2016-04-14 10:35:14.135530] I [gsyncd(slave):653:main_i] <top>: syncing: gluster://localhost:slave
[2016-04-14 10:35:15.292807] I [resource(slave):902:service_loop] GLUSTER: slave listening
[2016-04-14 10:35:19.146552] I [gsyncd(slave):653:main_i] <top>: syncing: gluster://localhost:slave
[2016-04-14 10:35:20.289783] I [resource(slave):902:service_loop] GLUSTER: slave listening
[2016-04-14 10:36:16.786941] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/8ca37ba9-5d4f-40a1-94df-aed09a2bd36a/symlink_to_files'

Comment 1 Vijay Bellur 2016-05-12 13:04:31 UTC
REVIEW: http://review.gluster.org/14307 (geo-rep: Add Retry in case of ENOTCONN during rmdir) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 2 Vijay Bellur 2016-05-13 02:55:46 UTC
REVIEW: http://review.gluster.org/14307 (geo-rep: Add Retry in case of ENOTCONN during rmdir) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 3 Amar Tumballi 2018-08-29 03:36:01 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.