Description of problem: Just starting files sync to slave with changelog mode of syncing, got a traceback in the master, and geo-rep status goes to faulty, and gets stuck in the same state, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2013-10-31 13:47:08.476759] I [master(/bricks/brick3):370:crawlwrap] _GMaster: 20 crawls, 0 turns [2013-10-31 13:48:08.555270] I [master(/bricks/brick3):370:crawlwrap] _GMaster: 20 crawls, 0 turns [2013-10-31 13:48:17.601533] E [repce(/bricks/brick3):188:__call__] RepceClient: call 1458:139835433391872:1383207497.59 (entry_ops) failed on peer with KeyError [2013-10-31 13:48:17.602133] E [syncdutils(/bricks/brick3):207:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 530, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1077, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 381, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 818, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 775, in process if self.process_change(change, done, retry): File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 744, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__ raise res KeyError: 'stat' [2013-10-31 13:48:17.604809] I [syncdutils(/bricks/brick3):159:finalize] <top>: exiting. [2013-10-31 13:48:17.613370] I [monitor(monitor):81:set_state] Monitor: new state: faulty >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>. And on the slave side this is the traceback, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2013-10-31 14:52:32.894243] I [resource(slave):631:service_loop] GLUSTER: slave listening [2013-10-31 14:52:36.283513] E [repce(slave):103:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 99, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 515, in entry_ops blob = entry_pack_mkdir(gfid, bname, e['stat']) KeyError: 'stat' [2013-10-31 14:52:36.295047] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF. [2013-10-31 14:52:36.295408] I [syncdutils(slave):159:finalize] <top>: exiting. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version-Release number of selected component (if applicable):glusterfs-3.4.0.37rhs-1.el6rhs.x86_64 How reproducible: Doesn't happen everytime. Steps to Reproduce: 1.Create and start a geo-rep relationship between master and slave. 2.start creating files on the master, 3.Check the status of the geo-rep Actual results: The geo-rep status goes to faulty. Expected results: Geo-rep should never go to faulty . Additional info:
Vijaykumar, was the slave cluster not updated with the new build? With the new build, the stat structure is not passed for create/mknod/mkdir calls. I see in the backtrace that the slave gsyncd accepting a stat structure.
fixed as part of performance enhancement done by Venky (https://code.engineering.redhat.com/gerrit/14774)
Not able to reproduce it in the build glusterfs-3.4.0.39rhs-1, marking it as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html