Description of problem: In A cascaded setup, rm -rf on master didn't completely propagated to level 2 slave. imaster(intermediate master) geo-rep logs had some trace-backs. =============================================================================== [2014-08-11 17:02:42.58315] I [master(/bricks/brick2/b7):1147:crawl] _GMaster: slave's time: (1407756535, 0) [2014-08-11 17:05:38.262269] E [repce(/bricks/brick2/b7):207:__call__] RepceClient: call 17308:140607044015872:1407756932.83 (entry_ops) failed on peer with OSError [2014-08-11 17:05:38.262967] E [syncdutils(/bricks/brick2/b7):270:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1335, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 513, in crawlwrap self.crawl(no_stime_update=no_stime_update) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1159, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 910, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 874, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 5] Input/output error [2014-08-11 17:05:38.266292] I [syncdutils(/bricks/brick2/b7):214:finalize] <top>: exiting. [2014-08-11 17:05:38.269656] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2014-08-11 17:05:38.270167] I [syncdutils(agent):214:finalize] <top>: exiting. [2014-08-11 17:05:38.286237] I [monitor(monitor):109:set_state] Monitor: new state: faulty [2014-08-11 17:05:38.371052] E [repce(/bricks/brick3/b11):207:__call__] RepceClient: call 17310:139915850999552:1407756928.11 (entry_ops) failed on peer with OSError [2014-08-11 17:05:38.371644] E [syncdutils(/bricks/brick3/b11):270:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1335, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 513, in crawlwrap self.crawl(no_stime_update=no_stime_update) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1159, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 910, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 874, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 61] No data available [2014-08-11 17:05:38.374772] I [syncdutils(/bricks/brick3/b11):214:finalize] <top>: exiting. [2014-08-11 17:05:38.378520] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2014-08-11 17:05:38.379053] I [syncdutils(agent):214:finalize] <top>: exiting. [2014-08-11 17:05:48.300900] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------ [2014-08-11 17:05:48.301461] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker [2014-08-11 17:05:48.399454] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------ [2014-08-11 17:05:48.400115] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker [2014-08-11 17:05:48.720443] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining... =============================================================================== Version-Release number of selected component (if applicable): glusterfs-3.6.0.27-1.el6rhs How reproducible: Didn't try to reproduce the issue Steps to Reproduce: 1. create and start a geo-rep cascaded setup between master, imaster and slave. 2. create some data on master and let it sync to imaster and slave. 3. then rm -rf on master mount-point. 4. Check if it succeeds to remove from the slave. Actual results: rm -rf on master fails to propagate to slave level 2 Expected results: rm -rf on master should be remove all the files from all the slaves. Additional info:
sosreport of all master, imaster and slave nodes @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1129208/