Bug 1261300 - [geo-rep]: recurrsive_rmdir causing "Transport endpoint not connected" followed by "Structure needs cleaning"
Summary: [geo-rep]: recurrsive_rmdir causing "Transport endpoint not connected" follow...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1335538
TreeView+ depends on / blocked
 
Reported: 2015-09-09 07:25 UTC by Rahul Hinduja
Modified: 2018-04-16 15:56 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1335538 (view as bug list)
Environment:
Last Closed: 2018-04-16 15:56:21 UTC
Embargoed:


Attachments (Terms of Use)

Description Rahul Hinduja 2015-09-09 07:25:12 UTC
Description of problem:
=======================

Observing "OSError: [Errno 107] Transport endpoint is not connected" and "OSError: [Errno 117] Structure needs cleaning:" traceback on the slave logs when issued "rm -rf" from master

[2015-09-08 19:01:09.604484] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in recursive_rmdir
    recursive_rmdir(gfid, entry, fullname)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 667, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/4752e6d2-4bef-4319-a535-2fbf1fea55e7/level04/hardlink_to_files'
[2015-09-08 19:01:09.621318] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-09-08 19:01:09.621871] I [syncdutils(slave):220:finalize] <top>: exiting.
[2015-09-08 19:01:20.188294] I [gsyncd(slave):649:main_i] <top>: syncing: gluster://localhost:slave
[2015-09-08 19:01:21.213108] I [gsyncd(slave):649:main_i] <top>: syncing: gluster://localhost:slave
[2015-09-08 19:01:21.355303] I [resource(slave):844:service_loop] GLUSTER: slave listening
[2015-09-08 19:01:22.369762] I [resource(slave):844:service_loop] GLUSTER: slave listening
[2015-09-08 19:01:36.427643] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['b7e28dc5-f77a-45ff-b5ed-38fc596359a7', '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44', '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44'])...[Errno 39] Directory not empty: '.gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44'
[2015-09-08 19:01:36.428356] W [resource(slave):692:entry_ops] <top>: Recursive remove b7e28dc5-f77a-45ff-b5ed-38fc596359a7 => .gfid/7fad9921-77c4-4806-8ed1-048df17c2fb7/level44failed: Directory not empty
[2015-09-08 19:01:37.557725] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 659, in recursive_rmdir
    EISDIR])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 117] Structure needs cleaning: '.gfid/b5a0188e-f6fa-4c67-81dd-66b4f68a330b/level14/symlink_to_files'
[2015-09-08 19:01:37.568291] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.

Eventually with retrial all the files are removed from the slave. 

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.1-14.el7rhgs.x86_64


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create master and slave cluster
2. Create geo-rep session between master and slave volume
3. Execute following fops on master and verify the sync on slave 
create, chmod, chown, chgrp, symlink, hardlink, truncate, rename, followed by remove {rm -rf }

Comment 3 Sakshi 2015-09-14 04:36:17 UTC
Reason for Structure needs cleaning error is that the lookup selfheal in DHT is not setting errors properly at few places. Fix sent here
http://review.gluster.org/#/c/12165/

Comment 4 Rahul Hinduja 2016-01-04 11:29:24 UTC
For records, hitting this consistently with rm -rf * on build: glusterfs-3.7.5-13.el7rhgs.x86_64

[2016-01-04 11:06:34.928106] I [resource(slave):844:service_loop] GLUSTER: slave listening
[2016-01-04 11:06:47.999924] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 685, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in recursive_rmdir
    recursive_rmdir(gfid, entry, fullname)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 667, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/1d9fc177-fc1e-411f-8ea9-b6e1234575af/level11/symlink_to_files'

Comment 5 Rahul Hinduja 2016-04-14 10:51:30 UTC
For records, hitting this consistently with rm -rf * on build: glusterfs-3.7.9-1.el7rhgs.x86_64

[2016-04-14 10:34:21.478831] W [resource(slave):721:entry_ops] <top>: Recursive remove ec3be6a9-d725-4a09-8a4c-ef98a84ee39f => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty
[2016-04-14 10:34:26.432831] W [syncdutils(slave):486:errno_wrap] <top>: reached maximum retries (['ec3be6a9-d725-4a09-8a4c-ef98a84ee39f', '.gfid/00000000-0000-0000-0000-000000000001/thread1', '.gfid/00000000-0000-0000-0000-000000000001/thread1'])...[Errno 39] Directory not empty: '.gfid/00000000-0000-0000-0000-000000000001/thread1/level07/level17/level27/level37'
[2016-04-14 10:34:26.433208] W [resource(slave):721:entry_ops] <top>: Recursive remove ec3be6a9-d725-4a09-8a4c-ef98a84ee39f => .gfid/00000000-0000-0000-0000-000000000001/thread1failed: Directory not empty
[2016-04-14 10:35:03.264699] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 666, in recursive_rmdir
    recursive_rmdir(gfid, entry, fullname)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/cd8f7393-5078-4776-a24f-3e8eb69e953f/level37/symlink_to_files'
[2016-04-14 10:35:03.274899] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-04-14 10:35:03.275094] I [syncdutils(slave):220:finalize] <top>: exiting.
[2016-04-14 10:35:08.326251] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/af8c23fb-dfd2-4c76-8c74-45813518acb0/symlink_to_files'
[2016-04-14 10:35:08.335688] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-04-14 10:35:08.335922] I [syncdutils(slave):220:finalize] <top>: exiting.
[2016-04-14 10:35:14.135530] I [gsyncd(slave):653:main_i] <top>: syncing: gluster://localhost:slave
[2016-04-14 10:35:15.292807] I [resource(slave):902:service_loop] GLUSTER: slave listening
[2016-04-14 10:35:19.146552] I [gsyncd(slave):653:main_i] <top>: syncing: gluster://localhost:slave
[2016-04-14 10:35:20.289783] I [resource(slave):902:service_loop] GLUSTER: slave listening
[2016-04-14 10:36:16.786941] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 714, in entry_ops
    [], [ENOTEMPTY, ESTALE, ENODATA])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 671, in recursive_rmdir
    errno_wrap(os.rmdir, [path], [ENOENT, ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/8ca37ba9-5d4f-40a1-94df-aed09a2bd36a/symlink_to_files'


Note You need to log in before you can comment on or make changes to this bug.