1669936 – [geo-rep]: Errno 107 Transport endpoint is not connected (In-service upgrade)

Bug 1669936 - [geo-rep]: Errno 107 Transport endpoint is not connected (In-service upgrade)

Summary: [geo-rep]: Errno 107 Transport endpoint is not connected (In-service upgrade)

Keywords:
Status:	CLOSED DUPLICATE of bug 1640573
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Kotresh HR
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-28 05:43 UTC by Rochelle
Modified:	2019-02-19 07:27 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-01-28 06:40:37 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rochelle 2019-01-28 05:43:06 UTC

Description of problem:
========================
While upgrading from 3.4.1 to 3.4.3 (stage testing) I hit
'Transport endpoint not connected' on the slave
However, there was no functionality impact. 


On the slave:
-------------
[2019-01-25 09:47:33.603162] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 643, in entry_ops
    st = lstat(slink)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 577, in lstat
    return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE, EBUSY])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 559, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/5013d4d8-b327-43bb-b887-6decf432f1cc'


[2019-01-25 09:47:01.108076] I [master(/rhs/brick2/b4):1956:syncjob] Syncer: Sync Time Taken    duration=0.4826 num_files=1     job=1   return_code=0
[2019-01-25 09:47:33.722486] E [repce(/rhs/brick1/b1):209:__call__] RepceClient: call failed    call=18805:140138876090176:1548409647.62        method=entry_ops        error=OSError
[2019-01-25 09:47:33.724375] E [syncdutils(/rhs/brick1/b1):349:log_raise_exception] <top>: Gluster Mount process exited error=ENOTCONN
[2019-01-25 09:47:33.761385] I [syncdutils(/rhs/brick1/b1):295:finalize] <top>: exiting.
[2019-01-25 09:47:33.768512] I [repce(/rhs/brick1/b1):92:service_loop] RepceServer: terminating on reaching EOF.
[2019-01-25 09:47:33.769392] I [syncdutils(/rhs/brick1/b1):295:finalize] <top>: exiting.
[2019-01-25 09:47:34.633352] I [monitor(monitor):299:monitor] Monitor: worker died in startup phase     brick=/rhs/brick1/b1


Version-Release number of selected component (if applicable):
=============================================================
[root@dhcp42-157 master]# rpm -qa | grep gluster
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-api-3.12.2-39.el7rhgs.x86_64
glusterfs-fuse-3.12.2-39.el7rhgs.x86_64
glusterfs-rdma-3.12.2-39.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-39.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-39.el7rhgs.x86_64
glusterfs-cli-3.12.2-39.el7rhgs.x86_64
glusterfs-events-3.12.2-39.el7rhgs.x86_64
python2-gluster-3.12.2-39.el7rhgs.x86_64
glusterfs-3.12.2-39.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-10.el7rhgs.noarch
glusterfs-server-3.12.2-39.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.4.x86_64
glusterfs-libs-3.12.2-39.el7rhgs.x86_64



How reproducible:
================
1/1

Steps to Reproduce:
===================
In-service upgrade of a geo-replication session

Actual results:
===============
Transport endpoint not connected was seen in the geo-rep logs

Expected results:
=================
There should be no such tracebacks


Additional info:
================
This was hit while stage testing (before there was a need for build 40)

Note You need to log in before you can comment on or make changes to this bug.