+++ This bug was initially created as a clone of Bug #1415053 +++ +++ This bug was initially created as a clone of Bug #1413967 +++ +++ This bug was initially created as a clone of Bug #1412883 +++ Description: The geo-replica sessions are going faulty for most of the volumes. The most of the geo-replica session having faulty state has below changelog exception: [2017-01-13 01:16:27.825808] I [master(/rhs/master/prd/soa/shared01/brick):519:crawlwrap] _GMaster: crawl interval: 1 seconds [2017-01-13 01:16:27.834862] I [master(/rhs/master/prd/soa/shared01/brick):1163:crawl] _GMaster: starting history crawl... turns: 1, stime: (1484261733, 0), etime: 1484270187 [2017-01-13 01:16:27.836390] E [repce(agent):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54, in history num_parallel) File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 100, in cl_history_changelog cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 2] No such file or directory [2017-01-13 01:16:27.837673] E [repce(/rhs/master/prd/soa/shared01/brick):207:__call__] RepceClient: call 8225:140388583266112:1484270187.84 (history) failed on peer with ChangelogException [2017-01-13 01:16:27.837953] E [resource(/rhs/master/prd/soa/shared01/brick):1506:service_loop] GLUSTER: Changelog History Crawl failed, [Errno 2] No such file or directory All the bricks were online for master volume and even all bricks for slave volume were online. Version-Release number of selected component (if applicable): mainline How reproducible: Rarely Steps to Reproduce: 1. Create Master and Slave Cluster 2. Create 1x2 volume on master and slave cluster 3. Create geo-rep session between master and slave volume 4. Start the geo-rep session 5. Mount the volume over Fuse 6. Check the geo-rep status and log. Actual results: Getting changelog exception and nodes goes faulty Expected results: Geo-replica session should be in Active/Passive state and workers should not get exception --- Additional comment from Kotresh HR on 2017-01-17 08:17:05 EST --- Patch posted: http://review.gluster.org/#/c/16420/ --- Additional comment from Worker Ant on 2017-01-19 04:39:46 EST --- COMMIT: http://review.gluster.org/16420 committed in master by Aravinda VK (avishwan) ------ commit 6f4811ca9331eee8c00861446f74ebe23626bbf8 Author: Kotresh HR <khiremat> Date: Tue Jan 17 06:39:25 2017 -0500 features/changelog: Fix htime xattr during brick crash The htime file contains the path of all the changelogs that is rolloved over till now. It also maintains xattr which tracks the latest changelog file rolloved over and the number of changelogs. The path and and xattr update happens in two different system calls. If the brick is crashed between them, the xattr value becomes stale and can lead to the failure of gf_history_changelog. To identify this, the total number of changelogs is being calculated based on htime file size and the record length. The above value is used in case of mismatch. Change-Id: Ia1c3efcfda7b74227805bb2eb933c9bd4305000b BUG: 1413967 Signed-off-by: Kotresh HR <khiremat> Reviewed-on: http://review.gluster.org/16420 NetBSD-regression: NetBSD Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Aravinda VK <avishwan> --- Additional comment from Worker Ant on 2017-01-20 01:57:57 EST --- REVIEW: http://review.gluster.org/16437 (features/changelog: Fix htime xattr during brick crash) posted (#1) for review on release-3.8 by Kotresh HR (khiremat)
COMMIT: https://review.gluster.org/16438 committed in release-3.9 by Aravinda VK (avishwan) ------ commit 5e86fe36a8e771ace8363e34dd9d6ea802ce0e01 Author: Kotresh HR <khiremat> Date: Tue Jan 17 06:39:25 2017 -0500 features/changelog: Fix htime xattr during brick crash The htime file contains the path of all the changelogs that is rolloved over till now. It also maintains xattr which tracks the latest changelog file rolloved over and the number of changelogs. The path and and xattr update happens in two different system calls. If the brick is crashed between them, the xattr value becomes stale and can lead to the failure of gf_history_changelog. To identify this, the total number of changelogs is being calculated based on htime file size and the record length. The above value is used in case of mismatch. > Change-Id: Ia1c3efcfda7b74227805bb2eb933c9bd4305000b > BUG: 1413967 > Signed-off-by: Kotresh HR <khiremat> > Reviewed-on: http://review.gluster.org/16420 > NetBSD-regression: NetBSD Build System <jenkins.org> > Smoke: Gluster Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Aravinda VK <avishwan> Change-Id: Ia1c3efcfda7b74227805bb2eb933c9bd4305000b BUG: 1415065 Signed-off-by: Kotresh HR <khiremat> (cherry picked from commit 6f4811ca9331eee8c00861446f74ebe23626bbf8) Reviewed-on: https://review.gluster.org/16438 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Aravinda VK <avishwan>
This bug is getting closed because GlusterFS-3.9 has reached its end-of-life [1]. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please open a new bug against the newer release. [1]: https://www.gluster.org/community/release-schedule/