+++ This bug was initially created as a clone of Bug #1345744 +++ +++ This bug was initially created as a clone of Bug #1344826 +++ Description of problem: ======================= While performing rm -rf on cascaded setup, found a worker crash on the primary master and intermittent master volume with traceback as: Master Volume: ============== [2016-06-11 09:41:17.359086] E [syncdutils(/rhs/brick1/b1):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 201, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 720, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1497, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 571, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1201, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1107, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 984, in process self.datas_in_batch.remove(unlinked_gfid) KeyError: '.gfid/757b0ad8-b6f5-44da-b71a-1b1c25a72988' Intermittent Master: ==================== [2016-06-11 09:41:51.681622] E [syncdutils(/rhs/brick1/b1):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 201, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 720, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1497, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 571, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1201, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1107, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 984, in process self.datas_in_batch.remove(unlinked_gfid) KeyError: '.gfid/757b0ad8-b6f5-44da-b71a-1b1c25a72988' [2016-06-11 09:41:51.684969] I [syncdutils(/rhs/brick1/b1):220:finalize] <top>: exiting. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.9-10 How reproducible: ================= Always, on cascaded setup upon remove (rm -rf) Steps to Reproduce: =================== 1. Create geo-rep cascaded setup with (vol0,vol1,vol2). Such that vol0=>vol1, vol1=>vol2 2. Mount the vol0 volume and perform fops like (cp,create,chmod,chown,chgrp,symlink,hardlink,truncate) on vol0 3. Let it sync to slave (vol1) and (vol2) 4. Calculate arequal checksum after every fop. It should match. 5. perform rm -rf on vol0 Actual results: =============== Worker crashed on vol1 and vol0 with keyerror. Expected results: ================= Worker shouldn't crash Additional info: ================ Performed rm -rf on non cascaded setup and didn't see the crash. Also, eventually files are removed from all Master and slaves. --- Additional comment from Vijay Bellur on 2016-06-13 02:33:20 EDT --- REVIEW: http://review.gluster.org/14706 (geo-rep: Safely handle if unliked GFID not present in data list) posted (#1) for review on master by Aravinda VK (avishwan) --- Additional comment from Vijay Bellur on 2016-06-20 02:37:06 EDT --- COMMIT: http://review.gluster.org/14706 committed in master by Aravinda VK (avishwan) ------ commit 4797ca3778d82a671716d4913c14f285591ae959 Author: Aravinda VK <avishwan> Date: Mon Jun 13 12:00:40 2016 +0530 geo-rep: Safely handle if unliked GFID not present in data list If unlinked GFID is not present in data list to be synced then Geo-rep worker was crashing with KeyError. Handled KeyError with this patch. BUG: 1345744 Change-Id: I5a1c9ca4473e32606df2e5c7e26c95faf55d44c0 Signed-off-by: Aravinda VK <avishwan> Reviewed-on: http://review.gluster.org/14706 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Kotresh HR <khiremat>
REVIEW: http://review.gluster.org/14766 (geo-rep: Safely handle if unliked GFID not present in data list) posted (#1) for review on release-3.7 by Aravinda VK (avishwan)
COMMIT: http://review.gluster.org/14766 committed in release-3.7 by Aravinda VK (avishwan) ------ commit d22305998f99bb9a5c89b5639ca95b3689881510 Author: Aravinda VK <avishwan> Date: Mon Jun 13 12:00:40 2016 +0530 geo-rep: Safely handle if unliked GFID not present in data list If unlinked GFID is not present in data list to be synced then Geo-rep worker was crashing with KeyError. Handled KeyError with this patch. BUG: 1348085 Change-Id: I5a1c9ca4473e32606df2e5c7e26c95faf55d44c0 Signed-off-by: Aravinda VK <avishwan> Reviewed-on: http://review.gluster.org/14706 (cherry picked from commit 4797ca3778d82a671716d4913c14f285591ae959) Reviewed-on: http://review.gluster.org/14766 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Kotresh HR <khiremat> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.13, please open a new bug report. glusterfs-3.7.13 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-July/027604.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user