1640573 – [geo-rep]: Transport endpoint not connected with arbiter volumes

Bug 1640573 - [geo-rep]: Transport endpoint not connected with arbiter volumes

Summary: [geo-rep]: Transport endpoint not connected with arbiter volumes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 3
Assignee:	Shwetha K Acharya
QA Contact:	Leela Venkaiah Gangavarapu
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1666974 1669936 (view as bug list)
Depends On:
Blocks:	1664335
TreeView+	depends on / blocked

Reported:	2018-10-18 10:44 UTC by Rochelle
Modified:	2020-12-17 04:50 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-6.0-38
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1664335 (view as bug list)
Environment:
Last Closed:	2020-12-17 04:50:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:5603	0	None	None	None	2020-12-17 04:50:42 UTC

Description Rochelle 2018-10-18 10:44:52 UTC

Description of problem:
======================
Brick goes down in the middle of this particular test case

On the Master:
--------------
[2018-10-18 09:30:58.93183] I [master(/rhs/brick2/b5):1450:crawl] _GMaster: slave's time        stime=(1539855011, 0)
[2018-10-18 09:31:00.624412] E [repce(/rhs/brick3/b8):209:__call__] RepceClient: call failed    call=17069:139754185205568:1539855057.49        method=entry_ops        error=OSError
[2018-10-18 09:31:00.625464] E [syncdutils(/rhs/brick3/b8):349:log_raise_exception] <top>: Gluster Mount process exited error=ENOTCONN
[2018-10-18 09:31:00.699959] I [syncdutils(/rhs/brick3/b8):295:finalize] <top>: exiting.


brick3/b8 logs report:
----------------------
[2018-10-18 09:31:00.725836] W [socket.c:593:__socket_rwv] 0-master-changelog: readv on /var/run/gluster/.f8271615d91fca5417068.sock failed (No data available)
[2018-10-18 09:31:00.739737] I [MSGID: 115036] [server.c:571:server_rpc_notify] 0-master-server: disconnecting connection from dhcp42-2.lab.eng.blr.redhat.com-17140-2018/10/18-09:30:01:598538-master-client-4-0-0
[2018-10-18 09:31:00.740127] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-master-server: Shutting down connection dhcp42-2.lab.eng.blr.redhat.com-17140-2018/10/18-09:30:01:598538-master-client-4-0-0


On the slave:
-------------
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 639, in entry_ops
    st = lstat(slink)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 577, in lstat
    return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE, EBUSY])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 559, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/4b67b1d8-b53b-4962-a4c4-294b3d5e750c'



Version-Release number of selected component (if applicable):
=============================================================
[root@dhcp42-2 bricks]# rpm -qa | grep gluster
glusterfs-server-3.12.2-22.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-22.el7rhgs.x86_64
glusterfs-rdma-3.12.2-22.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.12.2-22.el7rhgs.x86_64
glusterfs-api-3.12.2-22.el7rhgs.x86_64
glusterfs-events-3.12.2-22.el7rhgs.x86_64
glusterfs-libs-3.12.2-22.el7rhgs.x86_64
glusterfs-fuse-3.12.2-22.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-22.el7rhgs.x86_64
python2-gluster-3.12.2-22.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-cli-3.12.2-22.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7.x86_64
[root@dhcp42-2 bricks]# 


How reproducible:
=================
2/2

Steps to Reproduce:
===================
1.Create and start a master and slave arbiter volume
2.Set up a geo-rep session between the 2
3.Mount the master and pump IO:

for i in {create,chmod,symlink,create,chown,chmod,create,symlink,chgrp,symlink,truncate,symlink,chown,create,symlink}; do crefi --multi -n 5 -b 10 -d 10 --max=10K --min=500 --random -T 10 -t text --fop=$i /mnt/master/ ; sleep 10 ; done


master vol info:
-----------------
Volume Name: master
Type: Distributed-Replicate
Volume ID: 5a97408a-8cc0-4f24-a306-7f9e143e6614
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.43.116:/rhs/brick2/b4
Brick2: 10.70.42.2:/rhs/brick2/b5
Brick3: 10.70.42.44:/rhs/brick2/b6 (arbiter)
Brick4: 10.70.43.116:/rhs/brick3/b7
Brick5: 10.70.42.2:/rhs/brick3/b8
Brick6: 10.70.42.44:/rhs/brick3/b9 (arbiter)
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: off
cluster.enable-shared-storage: enable


slave vol info:
---------------
Volume Name: slave
Type: Distributed-Replicate
Volume ID: 125aece0-1800-4065-a363-f36dc0efc6f5
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.42.226:/rhs/brick2/b4
Brick2: 10.70.43.81:/rhs/brick2/b5
Brick3: 10.70.41.204:/rhs/brick2/b6 (arbiter)
Brick4: 10.70.42.226:/rhs/brick3/b7
Brick5: 10.70.43.81:/rhs/brick3/b8
Brick6: 10.70.41.204:/rhs/brick3/b9 (arbiter)
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off



Actual results:
==============
Brick went down

Expected results:
================
Brick should not go down

Comment 9 Sunny Kumar 2019-01-28 06:40:37 UTC

*** Bug 1669936 has been marked as a duplicate of this bug. ***

Comment 13 Kotresh HR 2019-11-19 05:25:03 UTC

*** Bug 1666974 has been marked as a duplicate of this bug. ***

Comment 16 Sunny Kumar 2020-02-24 14:33:13 UTC

It's targeted for 3.5.2 once branching is done will back-port the fix.

Comment 25 errata-xmlrpc 2020-12-17 04:50:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603

Note You need to log in before you can comment on or make changes to this bug.