Bug 1281218 - More redundant initial join logic to avoid becoming a fake coordinator
Summary: More redundant initial join logic to avoid becoming a fake coordinator
Keywords:
Status: ASSIGNED
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: JGroups
Version: 6.4.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Bela Ban
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-12 04:44 UTC by Osamu Nagano
Modified: 2023-04-01 08:00 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker JGRP-1977 0 Major Resolved More redundant initial join logic to avoid becoming a fake coordinator 2019-06-11 02:23:59 UTC
Red Hat Issue Tracker PRODMGT-1463 0 Major Resolved Better handling of a new node in a different network segment 2019-06-11 02:23:59 UTC

Description Osamu Nagano 2015-11-12 04:44:08 UTC
(From JGRP-1977)

If the very initial JGroups discovery packet is lost, it is never recovered by the current GMS join logic. The node will be a standalone coordinator then merges after several minutes.

This can happen if a new node reside in another network segment and a switch between the segments requires some time to establish a new multicast route. Currently, there is no enough time between IGMP join (by MulticastSocket#joinGroup()) and the JGroups discovery packet and the later is lost in such a network environment. Because the number of nodes can be very large, configuring a static route in the switch is not reasonable.

Specifically, in method org.jgroups.protocols.pbcast.ClientGmsImpl#joinInternal(), part of gms.getDownProtocol().down(Event.FIND_INITIAL_MBRS_EVT) is outside of the retry loop of GMS.max_join_attempts and GMS.join_timeout.


Note You need to log in before you can comment on or make changes to this bug.