1281218 – More redundant initial join logic to avoid becoming a fake coordinator

Bug 1281218 - More redundant initial join logic to avoid becoming a fake coordinator

Summary: More redundant initial join logic to avoid becoming a fake coordinator

Keywords:
Status:	ASSIGNED
Alias:	None
Product:	JBoss Data Grid 6
Classification:	JBoss
Component:	JGroups
Sub Component:
Version:	6.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Bela Ban
QA Contact:	Martin Gencur
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-11-12 04:44 UTC by Osamu Nagano
Modified:	2023-04-01 08:00 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	JGRP-1977	0	Major	Resolved	More redundant initial join logic to avoid becoming a fake coordinator	2019-06-11 02:23:59 UTC
Red Hat Issue Tracker	PRODMGT-1463	0	Major	Resolved	Better handling of a new node in a different network segment	2019-06-11 02:23:59 UTC

Description Osamu Nagano 2015-11-12 04:44:08 UTC

(From JGRP-1977)

If the very initial JGroups discovery packet is lost, it is never recovered by the current GMS join logic. The node will be a standalone coordinator then merges after several minutes.

This can happen if a new node reside in another network segment and a switch between the segments requires some time to establish a new multicast route. Currently, there is no enough time between IGMP join (by MulticastSocket#joinGroup()) and the JGroups discovery packet and the later is lost in such a network environment. Because the number of nodes can be very large, configuring a static route in the switch is not reasonable.

Specifically, in method org.jgroups.protocols.pbcast.ClientGmsImpl#joinInternal(), part of gms.getDownProtocol().down(Event.FIND_INITIAL_MBRS_EVT) is outside of the retry loop of GMS.max_join_attempts and GMS.join_timeout.

Note You need to log in before you can comment on or make changes to this bug.