Bug 1241882

Summary: GlusterD cannot restart after being probed into a cluster.
Product: [Community] GlusterFS Reporter: Kaushal <kaushal>
Component: glusterdAssignee: Kaushal <kaushal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-16 13:22:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1234725    

Description Kaushal 2015-07-10 10:16:04 UTC
In a cluster having 2 networks, sometimes when a new peer is added into the cluster, glusterd on the new peer cannot restart. The restart fails because, it cannot resolve bricks belonging the peer which probed the new peer into the cluster. The resolution only fails if the bricks were created on the 2nd network of the initiator peer, because the new peer doesn't know about the 2nd network of the initiator. 

This is caused by race which hadn't been encountered before. The analysis is as follows.

Assuming A, B and C as the peers. A and B are a cluster and have probed each other on the 2 networks. C is probed from A.

During the probe, C is first validate by A. Once C is accepted, A sends and update to both B and C to inform them of the each other. The update C gets from A doesn't have A's second network information. C can only get this information when B sends an update to C.

The problem faced here was that B didn't send an update to C. This happens because B sending an update to C depends on the ordering of connection establishment between B and C.

B and C both try to establish connections to each other once they receive A's update and get to know of each other. If B establishes the connection first then it sends and update to C. But if C establishes the connection first, B will not send an update to C.

This is the first time this situation was observed. This doesn't happen always.

Comment 1 Anand Avati 2015-07-10 16:02:10 UTC
REVIEW: http://review.gluster.org/11625 (glusterd: Send friend update even for EVENT_RCVD_ACC) posted (#1) for review on master by Kaushal M (kaushal)

Comment 2 Niels de Vos 2016-06-16 13:22:21 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user