Bug 1201712

Summary: [georep]: Transition from xsync to changelog doesn't happen once the brick is brought online
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED ERRATA QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: aavati, avishwan, csaba, nlevinki, vagarwal
Target Milestone: ---   
Target Release: RHGS 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.0-2.el6rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1202649 (view as bug list) Environment:
Last Closed: 2015-07-29 04:39:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1202649, 1202842, 1217928, 1223636    

Description Rahul Hinduja 2015-03-13 10:24:18 UTC
Description of problem:
=======================

If a brick is offline there is a transition from changelog to xsync since changelogs can not be captured, once the brick is brough online the xsync continuous to be active and doesnt trasition to changelog:

[2015-03-13 19:20:52.923316] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 41, in scan
    return Changes.cl_scan()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_scan
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 111] Connection refused
[2015-03-13 19:20:52.924300] E [repce(/rhs/brick1/b1):207:__call__] RepceClient: call 28276:140684070041344:1426254652.92 (scan) failed on peer with ChangelogException
[2015-03-13 19:20:52.924525] I [resource(/rhs/brick1/b1):1352:service_loop] GLUSTER: Changelog crawl failed, fallback to xsync

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.6.0.51-1.el6rhs.x86_64

Steps carried:
==============

1. Create a master volume (2x3) from 3 nodes N1,N2,N3 consisting 2 bricks each.
2. Start the master volume
3. Create a slave volume (2x2) from 2 nodes S1,S2
4. Start a slave volume
5. Mount the master volume to the client
6. Create and start the georep session between master and slave
7. Copy the huge set of data from the client on master volume
8. While the data is in progress, bring bricks offline and online from node N1 and N2. Ensured that not to bring bricks offline from node N3 keeping one brick constant up in x3 replica.
9. After sometime when all bricks are online, check the geo-rep status and logs

Actual results:
==============

georep status is shown as hybrid and logs shows that it failed to transition to changelog and fallsback to xsync

Comment 3 Aravinda VK 2015-03-17 09:30:22 UTC
Upstream patch sent for review. http://review.gluster.org/#/c/9758

Comment 10 Rahul Hinduja 2015-05-25 09:41:10 UTC
Verified with the build: glusterfs-3.7.0-2.el6rhs.x86_64

With the current implementation, if the bricks are gone down and comes back it is picked up as history crawl. Once the sync is done, the crawl successfully changes back to changelog. This is expected behavior. 

Didn't hit the issue with the mentioned steps. Moving the bug to verified state. 

[root@georep1 ~]# cat /var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.46.154%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log | grep "Changelog crawl failed, fallback to xsync" | wc
      0       0       0
[root@georep1 ~]# cat /var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.46.154%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log | grep "ChangelogException: [Errno 111]"  | wc
      0       0       0
[root@georep1 ~]#

Comment 12 errata-xmlrpc 2015-07-29 04:39:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html