Description of problem: ======================= The ACTIVE workers (2/3) of a geo-replication session remain in a FAULTY state when a single directory has been created in the master as shown: [root@dhcp42-18 master]# gluster volume geo-replication master 10.70.43.116::slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ----------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.18 master /rhs/brick1/b1 root 10.70.43.116::slave 10.70.42.246 Active Changelog Crawl 2018-07-09 04:30:01 10.70.42.18 master /rhs/brick2/b4 root 10.70.43.116::slave N/A Faulty N/A N/A 10.70.42.18 master /rhs/brick3/b7 root 10.70.43.116::slave 10.70.42.246 Active History Crawl 2018-07-09 04:28:01 10.70.41.239 master /rhs/brick1/b2 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick2/b5 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick3/b8 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.43.179 master /rhs/brick1/b3 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick2/b6 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick3/b9 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A [root@dhcp42-18 master]# gluster volume geo-replication master 10.70.43.116::slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ----------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.18 master /rhs/brick1/b1 root 10.70.43.116::slave 10.70.42.246 Active Changelog Crawl 2018-07-09 04:30:01 10.70.42.18 master /rhs/brick2/b4 root 10.70.43.116::slave N/A Faulty N/A N/A 10.70.42.18 master /rhs/brick3/b7 root 10.70.43.116::slave N/A Faulty N/A N/A 10.70.41.239 master /rhs/brick1/b2 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick2/b5 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick3/b8 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.43.179 master /rhs/brick1/b3 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick2/b6 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick3/b9 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A [root@dhcp42-18 master]# gluster volume geo-replication master 10.70.43.116::slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ----------------------------------------------------------------------------------------------------------------------------------------------------- 10.70.42.18 master /rhs/brick1/b1 root 10.70.43.116::slave 10.70.42.246 Active Changelog Crawl 2018-07-09 04:30:01 10.70.42.18 master /rhs/brick2/b4 root 10.70.43.116::slave N/A Faulty N/A N/A 10.70.42.18 master /rhs/brick3/b7 root 10.70.43.116::slave N/A Faulty N/A N/A 10.70.41.239 master /rhs/brick1/b2 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick2/b5 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.41.239 master /rhs/brick3/b8 root 10.70.43.116::slave 10.70.43.116 Passive N/A N/A 10.70.43.179 master /rhs/brick1/b3 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick2/b6 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A 10.70.43.179 master /rhs/brick3/b9 root 10.70.43.116::slave 10.70.42.128 Passive N/A N/A The following was the traceback on the master: ---------------------------------------------- [2018-07-09 08:30:04.763963] E [repce(/rhs/brick2/b4):209:__call__] RepceClient: call failed call=28558:139802932234048:1531125004.38 method=entry_ops error=OSError [2018-07-09 08:30:04.764759] E [syncdutils(/rhs/brick2/b4):348:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 803, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1586, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1396, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1370, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1204, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1114, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 228, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 210, in __call__ raise res OSError: [Errno 2] No such file or directory: '/rhs/brick1/b1/.glusterfs/6a/0e/6a0e4415-f45c-4dcb-9862-a8925a586f57' [2018-07-09 08:30:04.800748] I [syncdutils(/rhs/brick2/b4):288:finalize] <top>: exiting. The following was the traceback on the slave: --------------------------------------------- [2018-07-09 08:31:18.867358] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 644, in entry_ops gfid[2:4], gfid)) OSError: [Errno 2] No such file or directory: '/rhs/brick1/b1/.glusterfs/6a/0e/6a0e4415-f45c-4dcb-9862-a8925a586f57' Version-Release number of selected component (if applicable): ============================================================== glusterfs-geo-replication-3.12.2-13.el7rhgs.x86_64 How reproducible: ================= 1/1 Steps to Reproduce: =================== 1.Create and start a geo-replication session 2.Mount the master and slave volume 3.Create a single directory on the master Actual results: =============== 2/3 ACTIVE workers went to FAULTY and did not come back to ACTIVE Expected results: ================ None of the workers should go to FAULTY
*** This bug has been marked as a duplicate of bug 1598384 ***