+++ This bug was initially created as a clone of Bug #1146823 +++ +++ This bug was initially created as a clone of Bug #1144428 +++ Description of problem: The session is going into faulty with OSError: [Errno 12] Cannot allocate memory backtrace in the logs. The operation I performed was sync existing data -> pause session -> rename all the files -> resume the session Version-Release number of selected component (if applicable): mainline How reproducible: Hit only once. Not sure I will be able to reproduce again. Steps to Reproduce: 1. Create and start a geo-rep session between 2*2 dist-rep master and 2*2 dist-rep slave volume. 2. Create and sync some 5k files in some directory structure. 3. Now pause the session. 5. rename all the files. 6. resume the session. Actual results: The session went to faulty MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ----------------------------------------------------------------------------------------------------------------------------- ccr.blr.redhat.com master /bricks/brick0 nirvana::slave faulty N/A N/A metallica.blr.redhat.com master /bricks/brick1 acdc::slave Passive N/A N/A beatles.blr.redhat.com master /bricks/brick3 rammstein::slave Passive N/A N/A pinkfloyd.blr.redhat.com master /bricks/brick2 led::slave faulty N/A N/A The backtrace in the master logs. [2014-09-19 16:19:53.933645] I [master(/bricks/brick2):1225:crawl] _GMaster: slave's time: (1411061833, 0) [2014-09-19 16:20:33.653033] E [repce(/bricks/brick2):207:__call__] RepceClient: call 18787:139727562630912:1411123833.64 (entry_ops) failed on peer with OSError [2014-09-19 16:20:33.653924] E [syncdutils(/bricks/brick2):270:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1324, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 524, in crawlwrap self.crawl(no_stime_update=no_stime_update) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1236, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 927, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 891, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 12] Cannot allocate memory [2014-09-19 16:20:33.657620] I [syncdutils(/bricks/brick2):214:finalize] <top>: exiting. [2014-09-19 16:20:33.663028] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2014-09-19 16:20:33.663907] I [syncdutils(agent):214:finalize] <top>: exiting. [2014-09-19 16:20:33.795839] I [monitor(monitor):222:monitor] Monitor: worker(/bricks/brick2) died in startup phase This is a remote backtrace propagated to master via RPC. The actual backtrace in slave logs are [2014-09-19 16:27:45.780600] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in entry_ops [ENOENT, ESTALE, EINVAL]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 470, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 78, in lsetxattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 12] Cannot allocate memory [2014-09-19 16:27:45.794786] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. Expected results: There should be no backtraces and no faulty sessions. Additional info: The slave volume had Cluster.hash-range-gfid on
*** This bug has been marked as a duplicate of bug 1147422 ***