+++ This bug was initially created as a clone of Bug #1146823 +++ +++ This bug was initially created as a clone of Bug #1144428 +++ Description of problem: The session is going into faulty with OSError: [Errno 12] Cannot allocate memory backtrace in the logs. The operation I performed was sync existing data -> pause session -> rename all the files -> resume the session Version-Release number of selected component (if applicable): mainline How reproducible: Hit only once. Not sure I will be able to reproduce again. Steps to Reproduce: 1. Create and start a geo-rep session between 2*2 dist-rep master and 2*2 dist-rep slave volume. 2. Create and sync some 5k files in some directory structure. 3. Now pause the session. 5. rename all the files. 6. resume the session. Actual results: The session went to faulty MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ----------------------------------------------------------------------------------------------------------------------------- ccr.blr.redhat.com master /bricks/brick0 nirvana::slave faulty N/A N/A metallica.blr.redhat.com master /bricks/brick1 acdc::slave Passive N/A N/A beatles.blr.redhat.com master /bricks/brick3 rammstein::slave Passive N/A N/A pinkfloyd.blr.redhat.com master /bricks/brick2 led::slave faulty N/A N/A The backtrace in the master logs. [2014-09-19 16:19:53.933645] I [master(/bricks/brick2):1225:crawl] _GMaster: slave's time: (1411061833, 0) [2014-09-19 16:20:33.653033] E [repce(/bricks/brick2):207:__call__] RepceClient: call 18787:139727562630912:1411123833.64 (entry_ops) failed on peer with OSError [2014-09-19 16:20:33.653924] E [syncdutils(/bricks/brick2):270:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1324, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 524, in crawlwrap self.crawl(no_stime_update=no_stime_update) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1236, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 927, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 891, in process_change self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 12] Cannot allocate memory [2014-09-19 16:20:33.657620] I [syncdutils(/bricks/brick2):214:finalize] <top>: exiting. [2014-09-19 16:20:33.663028] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2014-09-19 16:20:33.663907] I [syncdutils(agent):214:finalize] <top>: exiting. [2014-09-19 16:20:33.795839] I [monitor(monitor):222:monitor] Monitor: worker(/bricks/brick2) died in startup phase This is a remote backtrace propagated to master via RPC. The actual backtrace in slave logs are [2014-09-19 16:27:45.780600] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 662, in entry_ops [ENOENT, ESTALE, EINVAL]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 470, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 78, in lsetxattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 12] Cannot allocate memory [2014-09-19 16:27:45.794786] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF. Expected results: There should be no backtraces and no faulty sessions. Additional info: The slave volume had Cluster.hash-range-gfid on --- Additional comment from Anand Avati on 2014-09-26 03:40:46 EDT --- REVIEW: http://review.gluster.org/8865 (geo-rep: Fix rename of directory syncing.) posted (#1) for review on master by Kotresh HR (khiremat) --- Additional comment from Anand Avati on 2014-09-29 02:32:50 EDT --- COMMIT: http://review.gluster.org/8865 committed in master by Venky Shankar (vshankar) ------ commit 7113d873af1f129effd8c6da21b49e797de8eab0 Author: Kotresh HR <khiremat> Date: Thu Sep 25 17:34:43 2014 +0530 geo-rep: Fix rename of directory syncing. The rename of directories are captured in all distributed brick changelogs. gsyncd processess these changelogs on each brick parallellaly. The first changelog to get processed will be successful. All subsequent ones will stat the 'src' and if not present, tries to create freshly on slave. It should be done only for files and not for directories. Hence when this code path was hit, regular file's blob is sent as directory's blob and gfid-access translator was erroring out as 'Invalid blob length' with errno as 'ENOMEM' Change-Id: I50545b02b98846464876795159d2446340155c82 BUG: 1146823 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: http://review.gluster.org/8865 Reviewed-by: Aravinda VK <avishwan> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Venky Shankar <vshankar> Tested-by: Venky Shankar <vshankar> --- Additional comment from Anand Avati on 2014-09-29 04:30:03 EDT --- REVIEW: http://review.gluster.org/8880 (geo-rep: Fix rename of directory syncing.) posted (#1) for review on release-3.6 by Aravinda VK (avishwan)
REVIEW: http://review.gluster.org/8880 (geo-rep: Fix rename of directory syncing.) posted (#2) for review on release-3.6 by Aravinda VK (avishwan)
COMMIT: http://review.gluster.org/8880 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit 19b2923fd56f19dadf2d81a76a0008784a4f684f Author: Kotresh HR <khiremat> Date: Thu Sep 25 17:34:43 2014 +0530 geo-rep: Fix rename of directory syncing. The rename of directories are captured in all distributed brick changelogs. gsyncd processess these changelogs on each brick parallellaly. The first changelog to get processed will be successful. All subsequent ones will stat the 'src' and if not present, tries to create freshly on slave. It should be done only for files and not for directories. Hence when this code path was hit, regular file's blob is sent as directory's blob and gfid-access translator was erroring out as 'Invalid blob length' with errno as 'ENOMEM' Change-Id: I50545b02b98846464876795159d2446340155c82 BUG: 1147422 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: http://review.gluster.org/8865 Reviewed-by: Aravinda VK <avishwan> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Venky Shankar <vshankar> Tested-by: Venky Shankar <vshankar> Reviewed-on: http://review.gluster.org/8880 Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users
*** Bug 1159190 has been marked as a duplicate of this bug. ***