1599625 – [GSS](6.4.z) Host controllers can not connect to domain after creating a rollout plan and restarting the master host controller

Bug 1599625 - [GSS](6.4.z) Host controllers can not connect to domain after creating a rollout plan and restarting the master host controller

Summary: [GSS](6.4.z) Host controllers can not connect to domain after creating a roll...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Enterprise Application Platform 6
Classification:	JBoss
Component:	Domain Management
Sub Component:
Version:	6.4.21
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	CR1
Target Release:	EAP 6.4.21
Assignee:	Jiri Ondrusek
QA Contact:	Peter Mackay
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	eap6421-payload
TreeView+	depends on / blocked

Reported:	2018-07-10 08:34 UTC by tmiyargi
Modified:	2021-12-10 16:35 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-08-19 12:45:38 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	JBEAP-15066	0	Major	Pull Request Sent	Cover possible error when host controllers can not connect to domain after creating a rollout plan and restarting the ma...	2019-07-25 18:06:13 UTC
Red Hat Knowledge Base (Solution)	3525371	0	None	None	None	2018-07-10 08:52:59 UTC

Description tmiyargi 2018-07-10 08:34:39 UTC

Creating a rollout plan and restarting the dc host prevent the other hosts to connect to the master again. The slave hc is unable to connect giving the error JBAS014687: Resource is immutable, the dc shows many errors like:

JBAS012119:  cancelled task by interrupting thread Thread[Host Controller Service Threads - 117,5,Host Controller Service Threads]

To reproduce create a domain with master and slave, create a rollout plan and restart like this:

rollout-plan add --name=my-plan --content={rollout groupa^groupb}
/host=my-dc:reload

Comment 4 Brian Stansberry 2018-07-10 14:59:14 UTC

What doesn't work is a slave HC reconnecting to the master following loss of connectivity.  A common case for that being the master is reloaded, which is the specific thing reported here.  Other things that cause reconnection, e.g. a network outage detected by the slave and then later resolved, would result in the same problem.

There is a guard in the code that rejects a particular call path for providing updates to rollout-plan resources, unless the resource is in a kind of "initial" state, i.e. what it would be in early in HC boot.  When the slave HC reconnects it syncs its local copy of the domain-wide model with what the master currently has, and while doing that it uses the call path that's being rejected. 

Best fix is probably to eliminate that guard as the value it provides is basically theoretical, a check against EAP developers doing something wrong that is hard to imagine actually being done. Trying to work around the call path that trips the guard would add complexity to already complex code.

Note You need to log in before you can comment on or make changes to this bug.