Description of problem: ======================= With the latest build finding traceback during the sync to slave: Master Log: =========== [2016-04-06 13:40:04.844175] E [syncdutils(/bricks/brick1/master_brick6):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 166, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 663, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1510, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 571, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1132, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1107, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 992, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 934, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 5] Input/output error [2016-04-06 13:40:04.846092] I [syncdutils(/bricks/brick1/master_brick6):220:finalize] <top>: exiting. [2016-04-06 13:40:04.854729] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. Slave Log: ========== [2016-04-06 13:39:56.414524] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 779, in entry_ops [ESTALE, EINVAL]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 79, in lsetxattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 5] Input/output error [2016-04-06 13:39:56.431735] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. [2016-04-06 13:39:56.432101] I [syncdutils(slave):220:finalize] <top>: exiting. Slave Client Logs Reports: ========================== [root@dhcp37-123 geo-replication-slaves]# grep -i "split" 8cabd68c-37f8-4b36-87b1-70bc941d7823\:gluster%3A%2F%2F127.0.0.1%3Aslave.log [root@dhcp37-123 geo-replication-slaves]# grep -i "split" 8cabd68c-37f8-4b36-87b1-70bc941d7823\:gluster%3A%2F%2F127.0.0.1%3Aslave.* 8cabd68c-37f8-4b36-87b1-70bc941d7823:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2016-04-06 13:42:15.735353] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-3: Failing SETATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error] 8cabd68c-37f8-4b36-87b1-70bc941d7823:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2016-04-06 13:42:15.735791] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-4: Failing SETATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error] 8cabd68c-37f8-4b36-87b1-70bc941d7823:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2016-04-06 13:42:46.110050] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-slave-replicate-2: Failing SETATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error] [root@dhcp37-123 geo-replication-slaves]# less 8cabd68c-37f8-4b36-87b1-70bc941d7823:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log [root@dhcp37-123 geo-replication-slaves]# Client logs shows split-brain errors, but neither the shd logs nor the heal info split-brains reports any files in split-brain: [root@dhcp37-123 geo-replication-slaves]# gluster volume heal slave info split-brain Brick 10.70.37.122:/bricks/brick0/slave_brick0 Number of entries in split-brain: 0 Brick 10.70.37.175:/bricks/brick0/slave_brick1 Number of entries in split-brain: 0 Brick 10.70.37.144:/bricks/brick0/slave_brick2 Number of entries in split-brain: 0 Brick 10.70.37.123:/bricks/brick0/slave_brick3 Number of entries in split-brain: 0 Brick 10.70.37.217:/bricks/brick0/slave_brick4 Number of entries in split-brain: 0 Brick 10.70.37.218:/bricks/brick0/slave_brick5 Number of entries in split-brain: 0 Brick 10.70.37.122:/bricks/brick1/slave_brick6 Number of entries in split-brain: 0 Brick 10.70.37.175:/bricks/brick1/slave_brick7 Number of entries in split-brain: 0 Brick 10.70.37.144:/bricks/brick1/slave_brick8 Number of entries in split-brain: 0 Brick 10.70.37.123:/bricks/brick1/slave_brick9 Number of entries in split-brain: 0 Brick 10.70.37.217:/bricks/brick1/slave_brick10 Number of entries in split-brain: 0 Brick 10.70.37.218:/bricks/brick1/slave_brick11 Number of entries in split-brain: 0 [root@dhcp37-123 geo-replication-slaves]# grep -i "split" /var/log/glusterfs/glustershd.log [root@dhcp37-123 geo-replication-slaves]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.9-1.el7rhgs.x86_64 How reproducible: ================= 2/2 Steps to Reproduce: =================== 1. Setup geo-rep between master and slave volume 2. Mount master volume (Fuse) 3. Use crefi tool for data set {create, chmod, chown} Actual results: =============== Files do eventually get sync to slave but lots of worker crashes are observed. Expected results: ================= Geo-rep worker shouldn't die during sync Additional info: ================ none of the bricks were brought down.
Provided devel_ack+ for BZ with Regression.
Upstream patch http://review.gluster.org/14295 is in review.
Two patches needed for this bug: 1. https://code.engineering.redhat.com/gerrit/#/c/74412/ 2.Waiting for regression run completion on: http://review.gluster.org/14319
(In reply to Raghavendra G from comment #9) > Two patches needed for this bug: > > 1. https://code.engineering.redhat.com/gerrit/#/c/74412/ > > 2.Waiting for regression run completion on: > http://review.gluster.org/14319 An identical patch passed regression at: http://review.gluster.org/14365 Downstream port of the same can be found at: https://bugzilla.redhat.com/show_bug.cgi?id=1325760
Verified with the build: glusterfs-3.7.9-5 Ran the automated suite which creates upto 20K entries {10k files, 7k symlinks, 3k directories}. Sync Type: rsync and tarssh Client: glusterfs All cases successfully completed and did not see the "OSError: [Errno 5] Input/output error" in the logs. Master: ======= [root@dhcp37-162 master]# grep -ri "Errno 5" * [root@dhcp37-162 master]# [root@dhcp37-40 master]# grep -ri "Errno 5" * [root@dhcp37-40 master]# [root@dhcp37-116 master]# grep -ri "Errno 5" * [root@dhcp37-116 master]# [root@dhcp37-189 master]# grep -ri "Errno 5" * [root@dhcp37-189 master]# [root@dhcp37-121 master]# grep -ri "Errno 5" * [root@dhcp37-121 master]# [root@dhcp37-190 master]# grep -ri "Errno 5" * [root@dhcp37-190 master]# Slave: ====== [root@dhcp37-196 geo-replication-slaves]# grep -ri "Errno 5" * [root@dhcp37-196 geo-replication-slaves]# [root@dhcp37-88 geo-replication-slaves]# grep -ri "Errno 5" * [root@dhcp37-88 geo-replication-slaves]# [root@dhcp37-200 geo-replication-slaves]# grep -ri "Errno 5" * [root@dhcp37-200 geo-replication-slaves]# [root@dhcp37-43 geo-replication-slaves]# grep -ri "Errno 5" * [root@dhcp37-43 geo-replication-slaves]# [root@dhcp37-213 geo-replication-slaves]# grep -ri "Errno 5" * [root@dhcp37-213 geo-replication-slaves]# [root@dhcp37-52 geo-replication-slaves]# grep -ri "Errno 5" * [root@dhcp37-52 geo-replication-slaves]# Moving the bug to verified state.
Laura, The doc text is fine. regards, Raghavendra
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240