Bug 1567881 - Halo replication I/O path is not working
Summary: Halo replication I/O path is not working
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-16 10:39 UTC by Pranith Kumar K
Modified: 2018-06-20 18:04 UTC (History)
1 user (show)

Fixed In Version: glusterfs-v4.1.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-20 18:04:29 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Pranith Kumar K 2018-04-16 10:39:50 UTC
Description of problem:
I found the following 2 bugs:
1) ping-latency numbers are not flowing till afr because of wrong call in default_notify I missed during the review.
2) During the mount, I observed that the brick with lowest ping-timeout within the halo-max-latency was showed to be down. I found a bug in child-up even handling which also I fixed.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2018-04-16 15:16:37 UTC
REVIEW: https://review.gluster.org/19883 (cluster/afr: Make sure latency-arg is passed to afr) posted (#1) for review on master by Pranith Kumar Karampuri

Comment 2 Worker Ant 2018-04-16 15:17:33 UTC
REVIEW: https://review.gluster.org/19884 (cluster/afr: Keep child-up until ping-event) posted (#1) for review on master by Pranith Kumar Karampuri

Comment 3 Worker Ant 2018-04-18 13:56:18 UTC
COMMIT: https://review.gluster.org/19883 committed in master by "Pranith Kumar Karampuri" <pkarampu@redhat.com> with a commit message- cluster/afr: Make sure latency-arg is passed to afr

xlator_notify doesn't pass the extra arguments that come in the
input function, so XLATOR_NOTIFY macro should be used instead
to pass the extra arguments to the function.

BUG: 1567881
fixes bz#1567881
Change-Id: Ic15b6c446638cbacf3149693147a754219037c47
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>

Comment 4 Worker Ant 2018-04-25 05:48:21 UTC
COMMIT: https://review.gluster.org/19884 committed in master by "Pranith Kumar Karampuri" <pkarampu@redhat.com> with a commit message- cluster/afr: Keep child-up until ping-event

Problem:
If we have 2 bricks, brick-A and brick-B with brick-A within halo-max-latency
and brick-B more than halo-max-latency. If we set both halo-min, halo-max replicas
as '1'. In this case, brick-A comes online and then ping-latency will be updated for it.
When brick-B comes online, we have 2 up-bricks, so the code tries to find the brick with
worst latency to mark it down. Since Brick-B just came online it always had '0' latency
so brick-B used to be marked offline and Brick-B would eventually be the one to be
online even when brick-A is more suited.

Fix:
Consider latency of just-up child as HALO_MAX_LATENCY so that worst-child until
ping-latency is found as the just-up brick. Also keep ping-latency as -1 until
child-up during initialization.

BUG: 1567881
fixes bz#1567881
Change-Id: I148262fe505468190f0eb99225d0f6d57cdb6f04
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>

Comment 5 Shyamsundar 2018-06-20 18:04:29 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.