Bug 1241882 - GlusterD cannot restart after being probed into a cluster.
Summary: GlusterD cannot restart after being probed into a cluster.
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Kaushal
QA Contact:
Depends On:
Blocks: 1234725
TreeView+ depends on / blocked
Reported: 2015-07-10 10:16 UTC by Kaushal
Modified: 2016-06-16 13:22 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2016-06-16 13:22:21 UTC
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:

Attachments (Terms of Use)

Description Kaushal 2015-07-10 10:16:04 UTC
In a cluster having 2 networks, sometimes when a new peer is added into the cluster, glusterd on the new peer cannot restart. The restart fails because, it cannot resolve bricks belonging the peer which probed the new peer into the cluster. The resolution only fails if the bricks were created on the 2nd network of the initiator peer, because the new peer doesn't know about the 2nd network of the initiator. 

This is caused by race which hadn't been encountered before. The analysis is as follows.

Assuming A, B and C as the peers. A and B are a cluster and have probed each other on the 2 networks. C is probed from A.

During the probe, C is first validate by A. Once C is accepted, A sends and update to both B and C to inform them of the each other. The update C gets from A doesn't have A's second network information. C can only get this information when B sends an update to C.

The problem faced here was that B didn't send an update to C. This happens because B sending an update to C depends on the ordering of connection establishment between B and C.

B and C both try to establish connections to each other once they receive A's update and get to know of each other. If B establishes the connection first then it sends and update to C. But if C establishes the connection first, B will not send an update to C.

This is the first time this situation was observed. This doesn't happen always.

Comment 1 Anand Avati 2015-07-10 16:02:10 UTC
REVIEW: http://review.gluster.org/11625 (glusterd: Send friend update even for EVENT_RCVD_ACC) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

Comment 2 Niels de Vos 2016-06-16 13:22:21 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.