Bug 1217928 - [georep]: Transition from xsync to changelog doesn't happen once the brick is brought online
Summary: [georep]: Transition from xsync to changelog doesn't happen once the brick is...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Aravinda VK
QA Contact:
URL:
Whiteboard:
Depends On: 1201712 1202649
Blocks: glusterfs-3.7.0
TreeView+ depends on / blocked
 
Reported: 2015-05-03 04:59 UTC by Aravinda VK
Modified: 2015-05-14 17:35 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.7.0beta2
Doc Type: Bug Fix
Doc Text:
Clone Of: 1202649
Environment:
Last Closed: 2015-05-14 17:27:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Aravinda VK 2015-05-03 04:59:36 UTC
+++ This bug was initially created as a clone of Bug #1202649 +++

+++ This bug was initially created as a clone of Bug #1201712 +++

Description of problem:
=======================

If a brick is offline there is a transition from changelog to xsync since changelogs can not be captured, once the brick is brough online the xsync continuous to be active and doesnt trasition to changelog:

[2015-03-13 19:20:52.923316] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 41, in scan
    return Changes.cl_scan()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_scan
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 111] Connection refused
[2015-03-13 19:20:52.924300] E [repce(/rhs/brick1/b1):207:__call__] RepceClient: call 28276:140684070041344:1426254652.92 (scan) failed on peer with ChangelogException
[2015-03-13 19:20:52.924525] I [resource(/rhs/brick1/b1):1352:service_loop] GLUSTER: Changelog crawl failed, fallback to xsync

Steps carried:
==============

1. Create a master volume (2x3) from 3 nodes N1,N2,N3 consisting 2 bricks each.
2. Start the master volume
3. Create a slave volume (2x2) from 2 nodes S1,S2
4. Start a slave volume
5. Mount the master volume to the client
6. Create and start the georep session between master and slave
7. Copy the huge set of data from the client on master volume
8. While the data is in progress, bring bricks offline and online from node N1 and N2. Ensured that not to bring bricks offline from node N3 keeping one brick constant up in x3 replica.
9. After sometime when all bricks are online, check the geo-rep status and logs

Actual results:
==============

georep status is shown as hybrid and logs shows that it failed to transition to changelog and fallsback to xsync

--- Additional comment from Anand Avati on 2015-03-17 02:52:33 EDT ---

REVIEW: http://review.gluster.org/9758 ([WIP] geo-rep: Do not fail-back to xsync if Changelog is failed) posted (#2) for review on master by Aravinda VK (avishwan)

--- Additional comment from Anand Avati on 2015-03-17 04:54:51 EDT ---

REVIEW: http://review.gluster.org/9758 (geo-rep: Do not fail-back to xsync if Changelog is failed) posted (#3) for review on master by Aravinda VK (avishwan)

--- Additional comment from Anand Avati on 2015-03-17 04:56:44 EDT ---

REVIEW: http://review.gluster.org/9758 (geo-rep: Do not fail-back to xsync if Changelog is failed) posted (#4) for review on master by Aravinda VK (avishwan)

--- Additional comment from Anand Avati on 2015-04-27 07:37:58 EDT ---

COMMIT: http://review.gluster.org/9758 committed in master by Vijay Bellur (vbellur) 
------
commit 60f764631971de4357d2f72a8995f844949de8ca
Author: Aravinda VK <avishwan>
Date:   Tue Mar 17 12:18:30 2015 +0530

    geo-rep: Do not fail-back to xsync if Changelog is failed
    
    Unless change_detector is set to xsync, do not fallback to
    xsync, except during Initial Sync or Partial History.
    
    When a brick goes down, Changelog exception is raised due
    to which geo-rep fallback to xsync. Even after brick comes
    back geo-rep will not consume Changelog.
    
    BUG: 1202649
    Change-Id: I1f8ea26ac7735f6ee09b3b143ee3eb66bfc9fc37
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: http://review.gluster.org/9758
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Saravanakumar Arumugam <sarumuga>
    Reviewed-by: Kotresh HR <khiremat>

Comment 1 Anand Avati 2015-05-03 05:00:48 UTC
REVIEW: http://review.gluster.org/10496 (geo-rep: Do not fail-back to xsync if Changelog is failed) posted (#1) for review on release-3.7 by Aravinda VK (avishwan)

Comment 2 Niels de Vos 2015-05-14 17:27:27 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 3 Niels de Vos 2015-05-14 17:28:49 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 4 Niels de Vos 2015-05-14 17:35:20 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.