Bug 1640573 - [geo-rep]: Transport endpoint not connected with arbiter volumes
Summary: [geo-rep]: Transport endpoint not connected with arbiter volumes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: RHGS 3.5.z Batch Update 3
Assignee: Shwetha K Acharya
QA Contact: Leela Venkaiah Gangavarapu
URL:
Whiteboard:
: 1666974 1669936 (view as bug list)
Depends On:
Blocks: 1664335
TreeView+ depends on / blocked
 
Reported: 2018-10-18 10:44 UTC by Rochelle
Modified: 2020-12-17 04:50 UTC (History)
8 users (show)

Fixed In Version: glusterfs-6.0-38
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1664335 (view as bug list)
Environment:
Last Closed: 2020-12-17 04:50:16 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:5603 0 None None None 2020-12-17 04:50:42 UTC

Description Rochelle 2018-10-18 10:44:52 UTC
Description of problem:
======================
Brick goes down in the middle of this particular test case

On the Master:
--------------
[2018-10-18 09:30:58.93183] I [master(/rhs/brick2/b5):1450:crawl] _GMaster: slave's time        stime=(1539855011, 0)
[2018-10-18 09:31:00.624412] E [repce(/rhs/brick3/b8):209:__call__] RepceClient: call failed    call=17069:139754185205568:1539855057.49        method=entry_ops        error=OSError
[2018-10-18 09:31:00.625464] E [syncdutils(/rhs/brick3/b8):349:log_raise_exception] <top>: Gluster Mount process exited error=ENOTCONN
[2018-10-18 09:31:00.699959] I [syncdutils(/rhs/brick3/b8):295:finalize] <top>: exiting.


brick3/b8 logs report:
----------------------
[2018-10-18 09:31:00.725836] W [socket.c:593:__socket_rwv] 0-master-changelog: readv on /var/run/gluster/.f8271615d91fca5417068.sock failed (No data available)
[2018-10-18 09:31:00.739737] I [MSGID: 115036] [server.c:571:server_rpc_notify] 0-master-server: disconnecting connection from dhcp42-2.lab.eng.blr.redhat.com-17140-2018/10/18-09:30:01:598538-master-client-4-0-0
[2018-10-18 09:31:00.740127] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-master-server: Shutting down connection dhcp42-2.lab.eng.blr.redhat.com-17140-2018/10/18-09:30:01:598538-master-client-4-0-0


On the slave:
-------------
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 639, in entry_ops
    st = lstat(slink)
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 577, in lstat
    return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE, EBUSY])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 559, in errno_wrap
    return call(*arg)
OSError: [Errno 107] Transport endpoint is not connected: '.gfid/4b67b1d8-b53b-4962-a4c4-294b3d5e750c'



Version-Release number of selected component (if applicable):
=============================================================
[root@dhcp42-2 bricks]# rpm -qa | grep gluster
glusterfs-server-3.12.2-22.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-22.el7rhgs.x86_64
glusterfs-rdma-3.12.2-22.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.12.2-22.el7rhgs.x86_64
glusterfs-api-3.12.2-22.el7rhgs.x86_64
glusterfs-events-3.12.2-22.el7rhgs.x86_64
glusterfs-libs-3.12.2-22.el7rhgs.x86_64
glusterfs-fuse-3.12.2-22.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-22.el7rhgs.x86_64
python2-gluster-3.12.2-22.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-cli-3.12.2-22.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7.x86_64
[root@dhcp42-2 bricks]# 


How reproducible:
=================
2/2

Steps to Reproduce:
===================
1.Create and start a master and slave arbiter volume
2.Set up a geo-rep session between the 2
3.Mount the master and pump IO:

for i in {create,chmod,symlink,create,chown,chmod,create,symlink,chgrp,symlink,truncate,symlink,chown,create,symlink}; do crefi --multi -n 5 -b 10 -d 10 --max=10K --min=500 --random -T 10 -t text --fop=$i /mnt/master/ ; sleep 10 ; done


master vol info:
-----------------
Volume Name: master
Type: Distributed-Replicate
Volume ID: 5a97408a-8cc0-4f24-a306-7f9e143e6614
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.43.116:/rhs/brick2/b4
Brick2: 10.70.42.2:/rhs/brick2/b5
Brick3: 10.70.42.44:/rhs/brick2/b6 (arbiter)
Brick4: 10.70.43.116:/rhs/brick3/b7
Brick5: 10.70.42.2:/rhs/brick3/b8
Brick6: 10.70.42.44:/rhs/brick3/b9 (arbiter)
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: off
cluster.enable-shared-storage: enable


slave vol info:
---------------
Volume Name: slave
Type: Distributed-Replicate
Volume ID: 125aece0-1800-4065-a363-f36dc0efc6f5
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.42.226:/rhs/brick2/b4
Brick2: 10.70.43.81:/rhs/brick2/b5
Brick3: 10.70.41.204:/rhs/brick2/b6 (arbiter)
Brick4: 10.70.42.226:/rhs/brick3/b7
Brick5: 10.70.43.81:/rhs/brick3/b8
Brick6: 10.70.41.204:/rhs/brick3/b9 (arbiter)
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off



Actual results:
==============
Brick went down

Expected results:
================
Brick should not go down

Comment 9 Sunny Kumar 2019-01-28 06:40:37 UTC
*** Bug 1669936 has been marked as a duplicate of this bug. ***

Comment 13 Kotresh HR 2019-11-19 05:25:03 UTC
*** Bug 1666974 has been marked as a duplicate of this bug. ***

Comment 16 Sunny Kumar 2020-02-24 14:33:13 UTC
It's targeted for 3.5.2 once branching is done will back-port the fix.

Comment 25 errata-xmlrpc 2020-12-17 04:50:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603


Note You need to log in before you can comment on or make changes to this bug.