Bug 1281218 - More redundant initial join logic to avoid becoming a fake coordinator
More redundant initial join logic to avoid becoming a fake coordinator
Status: ASSIGNED
Product: JBoss Data Grid 6
Classification: JBoss
Component: JGroups (Show other bugs)
6.4.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Bela Ban
Martin Gencur
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-11 23:44 EST by Osamu Nagano
Modified: 2015-12-10 21:46 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker JGRP-1977 Major Resolved More redundant initial join logic to avoid becoming a fake coordinator 2017-05-29 01:27 EDT
JBoss Issue Tracker PRODMGT-1463 Major Resolved Better handling of a new node in a different network segment 2017-05-29 01:27 EDT

  None (edit)
Description Osamu Nagano 2015-11-11 23:44:08 EST
(From JGRP-1977)

If the very initial JGroups discovery packet is lost, it is never recovered by the current GMS join logic. The node will be a standalone coordinator then merges after several minutes.

This can happen if a new node reside in another network segment and a switch between the segments requires some time to establish a new multicast route. Currently, there is no enough time between IGMP join (by MulticastSocket#joinGroup()) and the JGroups discovery packet and the later is lost in such a network environment. Because the number of nodes can be very large, configuring a static route in the switch is not reasonable.

Specifically, in method org.jgroups.protocols.pbcast.ClientGmsImpl#joinInternal(), part of gms.getDownProtocol().down(Event.FIND_INITIAL_MBRS_EVT) is outside of the retry loop of GMS.max_join_attempts and GMS.join_timeout.

Note You need to log in before you can comment on or make changes to this bug.