Description of problem: ======================= If a brick is offline there is a transition from changelog to xsync since changelogs can not be captured, once the brick is brough online the xsync continuous to be active and doesnt trasition to changelog: [2015-03-13 19:20:52.923316] E [repce(agent):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 41, in scan return Changes.cl_scan() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_scan cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 111] Connection refused [2015-03-13 19:20:52.924300] E [repce(/rhs/brick1/b1):207:__call__] RepceClient: call 28276:140684070041344:1426254652.92 (scan) failed on peer with ChangelogException [2015-03-13 19:20:52.924525] I [resource(/rhs/brick1/b1):1352:service_loop] GLUSTER: Changelog crawl failed, fallback to xsync Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.6.0.51-1.el6rhs.x86_64 Steps carried: ============== 1. Create a master volume (2x3) from 3 nodes N1,N2,N3 consisting 2 bricks each. 2. Start the master volume 3. Create a slave volume (2x2) from 2 nodes S1,S2 4. Start a slave volume 5. Mount the master volume to the client 6. Create and start the georep session between master and slave 7. Copy the huge set of data from the client on master volume 8. While the data is in progress, bring bricks offline and online from node N1 and N2. Ensured that not to bring bricks offline from node N3 keeping one brick constant up in x3 replica. 9. After sometime when all bricks are online, check the geo-rep status and logs Actual results: ============== georep status is shown as hybrid and logs shows that it failed to transition to changelog and fallsback to xsync
Upstream patch sent for review. http://review.gluster.org/#/c/9758
Verified with the build: glusterfs-3.7.0-2.el6rhs.x86_64 With the current implementation, if the bricks are gone down and comes back it is picked up as history crawl. Once the sync is done, the crawl successfully changes back to changelog. This is expected behavior. Didn't hit the issue with the mentioned steps. Moving the bug to verified state. [root@georep1 ~]# cat /var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.46.154%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log | grep "Changelog crawl failed, fallback to xsync" | wc 0 0 0 [root@georep1 ~]# cat /var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.46.154%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log | grep "ChangelogException: [Errno 111]" | wc 0 0 0 [root@georep1 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html