Bug 1261300

Summary: [geo-rep]: recurrsive_rmdir causing "Transport endpoint not connected" followed by "Structure needs cleaning"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: urgent Docs Contact:
Priority: high    
Version: rhgs-3.1CC: avishwan, chrisw, csaba, nlevinki, ravishankar, sarumuga, smohan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1335538 (view as bug list) Environment:
Last Closed: 2018-04-16 15:56:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1335538    

Description Rahul Hinduja 2015-09-09 07:25:12 UTC
Description of problem:
=======================

Observing "OSError: [Errno 107] Transport endpoint is not connected" and "OSError: [Errno 117] Structure needs cleaning:" traceback on the slave logs when issued "rm -rf" from master

[2015-09-08 19:01:09.604484] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in recursive_rmdir
    recursive_rmdir(gfid, entry, fullname)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 667, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/4752e6d2-4bef-4319-a535-2fbf1fea55e7/level04/hardlink_to_files'
[2015-09-08 19:01:09.621318] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-09-08 19:01:09.621871] I [syncdutils(slave):220:finalize] <top>: exiting.
[2015-09-08 19:01:20.188294] I [gsyncd(slave):649:main_i] <top>: syncing: gluster://localhost:slave
[2015-09-08 19:01:21.213108] I [gsyncd(slave):649:main_i] <top>: syncing: gluster://localhost:slave
[2015-09-08 19:01:21.355303] I [resource(slave):844:service_loop] GLUSTER: slave listening
[2015-09-08 19:01:22.369762] I [resource(slave):844:service_loop] GLUSTER: slave listening
[2015-09-08 19:01:36.427643] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['b7e28dc5-f77a-45ff-b5ed-38fc596359a7', '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44', '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44'])...[Errno 39] Directory not empty: '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44'
[2015-09-08 19:01:36.428356] W [resource(slave):692:entry_ops] <top>: Recursive remove b7e28dc5-f77a-45ff-b5ed-38fc596359a7 => .gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44failed: Directory not empty
[2015-09-08 19:01:37.557725] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 659, in recursive_rmdir
    EISDIR])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 117] Structure needs cleaning: '.gfid/b5a0188e-f6fa-4c67-81dd-66b4f68a330b/level14/symlink_to_files'
[2015-09-08 19:01:37.568291] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.

Eventually with retrial all the files are removed from the slave. 

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.1-14.el7rhgs.x86_64


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create master and slave cluster
2. Create geo-rep session between master and slave volume
3. Execute following fops on master and verify the sync on slave 
create, chmod, chown, chgrp, symlink, hardlink, truncate, rename, followed by remove {rm -rf }

Comment 3 Sakshi 2015-09-14 04:36:17 UTC
Reason for Structure needs cleaning error is that the lookup selfheal in DHT is not setting errors properly at few places. Fix sent here
http://review.gluster.org/#/c/12165/

Comment 4 Rahul Hinduja 2016-01-04 11:29:24 UTC
For records, hitting this consistently with rm -rf * on build: glusterfs-3.7.5-13.el7rhgs.x86_64

[2016-01-04 11:06:34.928106] I [resource(slave):844:service_loop] GLUSTER: slave listening
[2016-01-04 11:06:47.999924] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in recursive_rmdir
    recursive_rmdir(gfid, entry, fullname)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 667, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/1d9fc177-fc1e-411f-8ea9-b6e1234575af/level11/symlink_to_files'

Comment 5 Rahul Hinduja 2016-04-14 10:51:30 UTC
For records, hitting this consistently with rm -rf * on build: glusterfs-3.7.9-1.el7rhgs.x86_64

[2016-04-14 10:34:21.478831] W [resource(slave):721:entry_ops] <top>: Recursive remove ec3be6a9-d725-4a09-8a4c-ef98a84ee39f => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty
[2016-04-14 10:34:26.432831] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['ec3be6a9-d725-4a09-8a4c-ef98a84ee39f', '.gfid/00000000-0000-0000-0000-000000000001/thread1', '.gfid/00000000-0000-0000-0000-000000000001/thread1'])...[Errno 39] Directory not empty: '.gfid/00000000-0000-0000-0000-000000000001/thread1/level07/level17/level27/level37'
[2016-04-14 10:34:26.433208] W [resource(slave):721:entry_ops] <top>: Recursive remove ec3be6a9-d725-4a09-8a4c-ef98a84ee39f => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty
[2016-04-14 10:35:03.264699] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 666, in recursive_rmdir
    recursive_rmdir(gfid, entry, fullname)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/cd8f7393-5078-4776-a24f-3e8eb69e953f/level37/symlink_to_files'
[2016-04-14 10:35:03.274899] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-04-14 10:35:03.275094] I [syncdutils(slave):220:finalize] <top>: exiting.
[2016-04-14 10:35:08.326251] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/af8c23fb-dfd2-4c76-8c74-45813518acb0/symlink_to_files'
[2016-04-14 10:35:08.335688] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-04-14 10:35:08.335922] I [syncdutils(slave):220:finalize] <top>: exiting.
[2016-04-14 10:35:14.135530] I [gsyncd(slave):653:main_i] <top>: syncing: gluster://localhost:slave
[2016-04-14 10:35:15.292807] I [resource(slave):902:service_loop] GLUSTER: slave listening
[2016-04-14 10:35:19.146552] I [gsyncd(slave):653:main_i] <top>: syncing: gluster://localhost:slave
[2016-04-14 10:35:20.289783] I [resource(slave):902:service_loop] GLUSTER: slave listening
[2016-04-14 10:36:16.786941] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/8ca37ba9-5d4f-40a1-94df-aed09a2bd36a/symlink_to_files'