1201712 – [georep]: Transition from xsync to changelog doesn't happen once the brick is brought online

Bug 1201712 - [georep]: Transition from xsync to changelog doesn't happen once the brick is brought online

Summary: [georep]: Transition from xsync to changelog doesn't happen once the brick is...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Aravinda VK
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1202649 1202842 1217928 1223636
TreeView+	depends on / blocked

Reported:	2015-03-13 10:24 UTC by Rahul Hinduja
Modified:	2015-07-29 04:39 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.7.0-2.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1202649 (view as bug list)
Environment:
Last Closed:	2015-07-29 04:39:01 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Rahul Hinduja 2015-03-13 10:24:18 UTC

Description of problem:
=======================

If a brick is offline there is a transition from changelog to xsync since changelogs can not be captured, once the brick is brough online the xsync continuous to be active and doesnt trasition to changelog:

[2015-03-13 19:20:52.923316] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 41, in scan
    return Changes.cl_scan()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_scan
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 111] Connection refused
[2015-03-13 19:20:52.924300] E [repce(/rhs/brick1/b1):207:__call__] RepceClient: call 28276:140684070041344:1426254652.92 (scan) failed on peer with ChangelogException
[2015-03-13 19:20:52.924525] I [resource(/rhs/brick1/b1):1352:service_loop] GLUSTER: Changelog crawl failed, fallback to xsync

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.6.0.51-1.el6rhs.x86_64

Steps carried:
==============

1. Create a master volume (2x3) from 3 nodes N1,N2,N3 consisting 2 bricks each.
2. Start the master volume
3. Create a slave volume (2x2) from 2 nodes S1,S2
4. Start a slave volume
5. Mount the master volume to the client
6. Create and start the georep session between master and slave
7. Copy the huge set of data from the client on master volume
8. While the data is in progress, bring bricks offline and online from node N1 and N2. Ensured that not to bring bricks offline from node N3 keeping one brick constant up in x3 replica.
9. After sometime when all bricks are online, check the geo-rep status and logs

Actual results:
==============

georep status is shown as hybrid and logs shows that it failed to transition to changelog and fallsback to xsync

Comment 3 Aravinda VK 2015-03-17 09:30:22 UTC

Upstream patch sent for review. http://review.gluster.org/#/c/9758

Comment 10 Rahul Hinduja 2015-05-25 09:41:10 UTC

Verified with the build: glusterfs-3.7.0-2.el6rhs.x86_64

With the current implementation, if the bricks are gone down and comes back it is picked up as history crawl. Once the sync is done, the crawl successfully changes back to changelog. This is expected behavior. 

Didn't hit the issue with the mentioned steps. Moving the bug to verified state. 

[root@georep1 ~]# cat /var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.46.154%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log | grep "Changelog crawl failed, fallback to xsync" | wc
      0       0       0
[root@georep1 ~]# cat /var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.46.154%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log | grep "ChangelogException: [Errno 111]"  | wc
      0       0       0
[root@georep1 ~]#

Comment 12 errata-xmlrpc 2015-07-29 04:39:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.