Bug 1599625 - [GSS](6.4.z) Host controllers can not connect to domain after creating a rollout plan and restarting the master host controller
Summary: [GSS](6.4.z) Host controllers can not connect to domain after creating a roll...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Domain Management
Version: 6.4.21
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: CR1
: EAP 6.4.21
Assignee: Jiri Ondrusek
QA Contact: Peter Mackay
URL:
Whiteboard:
Depends On:
Blocks: eap6421-payload
TreeView+ depends on / blocked
 
Reported: 2018-07-10 08:34 UTC by tmiyargi
Modified: 2021-12-10 16:35 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-19 12:45:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker JBEAP-15066 0 Major Pull Request Sent Cover possible error when host controllers can not connect to domain after creating a rollout plan and restarting the ma... 2019-07-25 18:06:13 UTC
Red Hat Knowledge Base (Solution) 3525371 0 None None None 2018-07-10 08:52:59 UTC

Description tmiyargi 2018-07-10 08:34:39 UTC
Creating a rollout plan and restarting the dc host prevent the other hosts to connect to the master again. The slave hc is unable to connect giving the error JBAS014687: Resource is immutable, the dc shows many errors like:

JBAS012119:  cancelled task by interrupting thread Thread[Host Controller Service Threads - 117,5,Host Controller Service Threads]

To reproduce create a domain with master and slave, create a rollout plan and restart like this:

rollout-plan add --name=my-plan --content={rollout groupa^groupb}
/host=my-dc:reload

Comment 4 Brian Stansberry 2018-07-10 14:59:14 UTC
What doesn't work is a slave HC reconnecting to the master following loss of connectivity.  A common case for that being the master is reloaded, which is the specific thing reported here.  Other things that cause reconnection, e.g. a network outage detected by the slave and then later resolved, would result in the same problem.

There is a guard in the code that rejects a particular call path for providing updates to rollout-plan resources, unless the resource is in a kind of "initial" state, i.e. what it would be in early in HC boot.  When the slave HC reconnects it syncs its local copy of the domain-wide model with what the master currently has, and while doing that it uses the call path that's being rejected. 

Best fix is probably to eliminate that guard as the value it provides is basically theoretical, a check against EAP developers doing something wrong that is hard to imagine actually being done. Trying to work around the call path that trips the guard would add complexity to already complex code.


Note You need to log in before you can comment on or make changes to this bug.