Description of problem: ====================== Brick goes down in the middle of this particular test case On the Master: -------------- [2018-10-18 09:30:58.93183] I [master(/rhs/brick2/b5):1450:crawl] _GMaster: slave's time stime=(1539855011, 0) [2018-10-18 09:31:00.624412] E [repce(/rhs/brick3/b8):209:__call__] RepceClient: call failed call=17069:139754185205568:1539855057.49 method=entry_ops error=OSError [2018-10-18 09:31:00.625464] E [syncdutils(/rhs/brick3/b8):349:log_raise_exception] <top>: Gluster Mount process exited error=ENOTCONN [2018-10-18 09:31:00.699959] I [syncdutils(/rhs/brick3/b8):295:finalize] <top>: exiting. brick3/b8 logs report: ---------------------- [2018-10-18 09:31:00.725836] W [socket.c:593:__socket_rwv] 0-master-changelog: readv on /var/run/gluster/.f8271615d91fca5417068.sock failed (No data available) [2018-10-18 09:31:00.739737] I [MSGID: 115036] [server.c:571:server_rpc_notify] 0-master-server: disconnecting connection from dhcp42-2.lab.eng.blr.redhat.com-17140-2018/10/18-09:30:01:598538-master-client-4-0-0 [2018-10-18 09:31:00.740127] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-master-server: Shutting down connection dhcp42-2.lab.eng.blr.redhat.com-17140-2018/10/18-09:30:01:598538-master-client-4-0-0 On the slave: ------------- Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 639, in entry_ops st = lstat(slink) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 577, in lstat return errno_wrap(os.lstat, [e], [ENOENT], [ESTALE, EBUSY]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 559, in errno_wrap return call(*arg) OSError: [Errno 107] Transport endpoint is not connected: '.gfid/4b67b1d8-b53b-4962-a4c4-294b3d5e750c' Version-Release number of selected component (if applicable): ============================================================= [root@dhcp42-2 bricks]# rpm -qa | grep gluster glusterfs-server-3.12.2-22.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-22.el7rhgs.x86_64 glusterfs-rdma-3.12.2-22.el7rhgs.x86_64 vdsm-gluster-4.19.43-2.3.el7rhgs.noarch gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-3.12.2-22.el7rhgs.x86_64 glusterfs-api-3.12.2-22.el7rhgs.x86_64 glusterfs-events-3.12.2-22.el7rhgs.x86_64 glusterfs-libs-3.12.2-22.el7rhgs.x86_64 glusterfs-fuse-3.12.2-22.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-22.el7rhgs.x86_64 python2-gluster-3.12.2-22.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 glusterfs-cli-3.12.2-22.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7.x86_64 [root@dhcp42-2 bricks]# How reproducible: ================= 2/2 Steps to Reproduce: =================== 1.Create and start a master and slave arbiter volume 2.Set up a geo-rep session between the 2 3.Mount the master and pump IO: for i in {create,chmod,symlink,create,chown,chmod,create,symlink,chgrp,symlink,truncate,symlink,chown,create,symlink}; do crefi --multi -n 5 -b 10 -d 10 --max=10K --min=500 --random -T 10 -t text --fop=$i /mnt/master/ ; sleep 10 ; done master vol info: ----------------- Volume Name: master Type: Distributed-Replicate Volume ID: 5a97408a-8cc0-4f24-a306-7f9e143e6614 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: 10.70.43.116:/rhs/brick2/b4 Brick2: 10.70.42.2:/rhs/brick2/b5 Brick3: 10.70.42.44:/rhs/brick2/b6 (arbiter) Brick4: 10.70.43.116:/rhs/brick3/b7 Brick5: 10.70.42.2:/rhs/brick3/b8 Brick6: 10.70.42.44:/rhs/brick3/b9 (arbiter) Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.brick-multiplex: off cluster.enable-shared-storage: enable slave vol info: --------------- Volume Name: slave Type: Distributed-Replicate Volume ID: 125aece0-1800-4065-a363-f36dc0efc6f5 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: 10.70.42.226:/rhs/brick2/b4 Brick2: 10.70.43.81:/rhs/brick2/b5 Brick3: 10.70.41.204:/rhs/brick2/b6 (arbiter) Brick4: 10.70.42.226:/rhs/brick3/b7 Brick5: 10.70.43.81:/rhs/brick3/b8 Brick6: 10.70.41.204:/rhs/brick3/b9 (arbiter) Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off Actual results: ============== Brick went down Expected results: ================ Brick should not go down
*** Bug 1669936 has been marked as a duplicate of this bug. ***
*** Bug 1666974 has been marked as a duplicate of this bug. ***
It's targeted for 3.5.2 once branching is done will back-port the fix.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5603