The geo-replica sessions are going faulty for most of the volumes. The most of the geo-replica session having faulty state has below changelog exception:
[2017-01-13 01:16:27.825808] I [master(/rhs/master/prd/soa/shared01/brick):519:crawlwrap] _GMaster: crawl interval: 1 seconds
[2017-01-13 01:16:27.834862] I [master(/rhs/master/prd/soa/shared01/brick):1163:crawl] _GMaster: starting history crawl... turns: 1, stime: (1484261733, 0), etime: 1484270187
[2017-01-13 01:16:27.836390] E [repce(agent):117:worker] <top>: call failed:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
res = getattr(self.obj, rmeth)(*in_data[2:])
File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54, in history
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 100, in cl_history_changelog
File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2017-01-13 01:16:27.837673] E [repce(/rhs/master/prd/soa/shared01/brick):207:__call__] RepceClient: call 8225:140388583266112:1484270187.84 (history) failed on peer with ChangelogException
[2017-01-13 01:16:27.837953] E [resource(/rhs/master/prd/soa/shared01/brick):1506:service_loop] GLUSTER: Changelog History Crawl failed, [Errno 2] No such file or directory
All the bricks were online for master volume and even all bricks for slave volume were online.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create Master and Slave Cluster
2. Create 1x2 volume on master and slave cluster
3. Create geo-rep session between master and slave volume
4. Start the geo-rep session
5. Mount the volume over Fuse
6. Check the geo-rep status and log.
Getting changelog exception and nodes goes faulty
Geo-replica session should be in Active/Passive state and workers should not get exception
COMMIT: http://review.gluster.org/16420 committed in master by Aravinda VK (firstname.lastname@example.org)
Author: Kotresh HR <email@example.com>
Date: Tue Jan 17 06:39:25 2017 -0500
features/changelog: Fix htime xattr during brick crash
The htime file contains the path of all the changelogs
that is rolloved over till now. It also maintains xattr
which tracks the latest changelog file rolloved over
and the number of changelogs. The path and and xattr
update happens in two different system calls. If the
brick is crashed between them, the xattr value becomes
stale and can lead to the failure of gf_history_changelog.
To identify this, the total number of changelogs is being
calculated based on htime file size and the record
length. The above value is used in case of mismatch.
Signed-off-by: Kotresh HR <firstname.lastname@example.org>
NetBSD-regression: NetBSD Build System <email@example.com>
Smoke: Gluster Build System <firstname.lastname@example.org>
CentOS-regression: Gluster Build System <email@example.com>
Reviewed-by: Aravinda VK <firstname.lastname@example.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.
glusterfs-3.10.0 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.