Bug 1015334

Summary: EAP 6 host controller domain host update results in managed servers no longer being "manageable"
Product: [JBoss] JBoss Operations Network Reporter: Larry O'Leary <loleary>
Component: Plugin -- JBoss EAP 6Assignee: Libor Zoubek <lzoubek>
Status: CLOSED CURRENTRELEASE QA Contact: Armine Hovsepyan <ahovsepy>
Severity: medium Docs Contact:
Priority: unspecified    
Version: JON 3.1.2CC: ahovsepy, hrupp, lzoubek, mfoley, myarboro, theute, tsegismo
Target Milestone: DR01   
Target Release: JON 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
It was discovered that if the domainHost attribute of an EAP 6 host controller changed, the managed servers connected through the host controller became unavailable. This prevented any new managed servers from being discovered. The host controller was still partially functioning in inventory, however its managed servers were unreachable due to the domain host property value being read-only with no method to update or correct the issue. A fix to the AS7 plug-in now attempts to read domainHost from host.xml when needed. The 'domainHost' read-only plug-in property is deprecated, and a new trait is introduced to better handle the scenario.
Story Points: ---
Clone Of:
: 1119497 (view as bug list) Environment:
Last Closed: 2014-12-11 14:04:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1119497    
Attachments:
Description Flags
before_hostname_change
none
after-hostname-change none

Description Larry O'Leary 2013-10-04 00:34:24 UTC
Description of problem:
If the domainHost attribute of an EAP 6 host controller changes, it results in the managed servers becoming unavailable. This also prevents any new managed servers from being discovered.

Such a change can occur to the domainHost attribute of the EAP 6 host controller if the system's host name is used to determine the value (i.e. no override value is specified in host.xml) and the host machine's network name changes due to a network partition event or topology change.

This change could also be user triggered by manually editing the host configuration and setting a new name. For example, if an existing EAP domain has been configured with multiple host controllers and is running. Later the user may decide to relocate or rename a machine based on its changed functionality or ordering in a cluster. 

Finally, such a change may also occur if an existing domain made up of multiple host controllers has one of its host controllers decommissioned and replaced by a new host controller with a different name but using the same network address and user credentials. 

The end result is that the host controller is still partially functioning in inventory but its managed servers are unreachable due to the domain host property value being read-only with no method to update/correct the issue. 

Version-Release number of selected component (if applicable):
4.4.0.JON312GA

How reproducible:
Always

Steps to Reproduce:
1.  Start EAP 6.1 domain server with 1 or more managed servers.
2.  Start JON 3.1.2 system.
3.  Import EAP 6.1 domain controller into inventory.
4.  Set management user credentials for EAP 6 resource.
5.  Verify EAP 6 resource is reported as available.
6.  Verify EAP 6 managed servers are discovered and reported as available.
7.  Shutdown EAP 6 domain controller.
8.  Change domain controller's host name in host.xml:

        sed -i 's/<host name=".*" xmlns="urn:jboss:domain:1.4">/<host name="newMaster" xmlns="urn:jboss:domain:1.4">/' "${JBOSS_HOME}/domain/configuration/host.xml"

9.  Start EAP 6 domain controller.
10. Run a detail discovery scan.

Actual results:
All managed servers report as unavailable.
No new managed servers appear for the newly discovered host named newMaster.

Expected results:
Managed servers under newMaster are discovered and reported as available.
Managed servers under master are still reported as unavailable.

Additional info:
The proper or appropriate resolution is not clear. From initial discussion and review of the use cases it seems that there are 3 possible solutions:

1. The domainHost plug-in configuration property be read/write but exposed as an advanced configuration option. 

In other words, do not provide it in the main configuration section but instead in a section which is collapsed by default and with a note/disclaimer stating that altering the value may result in lost or degraded functionality.

By making this property read/write you provide the user the ability to determine what its value should be and if the underlying value changes, the user can decide whether the change is/was expected and the existing host controller's domainHost property should be updated to reflect.

2. The domainHost plug-in configuration property can remain read-only but be populated and updated by discovery or a trait. Considering this value is part of the actual EAP server's configuration and is determined by the managed resource itself, should we even care what it is from a plug-in configuration perspective. Instead, we should only care about this value when performing discovery and the value should be determined at runtime when discovery is being performed. 

The downside to this approach is if someone accidentally starts a completely unrelated EAP host controller on the same host/port with the same user/password but a different domainHost, the result will be the domainHost will be updated and a whole new set of managed servers (and other child resources) will be discovered. Granted, as soon as the accidental server is shutdown and the real server is started back up, the configuration will once again update (actually revert to its original value) and the original inventory will return to normal and the accidental resource discovery can be undone by removing the bad resources from inventory.

3. The domainHost should be part of the resource name and key and therefore be unique for this host controller within inventory. A change of the underlying domainHost value in the EAP host configuration would essentially result in a newly discovered resource (host controller) being added to the discovery queue. The availability check should also be updated to ensure that the "available" server is the one we expect. If its domainHost (and perhaps other signature configuration) is different, the resource is reported as unavailable with a message indicated that the connected resource does not appear to be the same as the existing resource in inventory due to <property> value <value not matching <property> value <value>...

The downside to 3. is that in the use-case where an existing host controller is simply changing its name, it will result in the existing host controller and its historic data becoming disconnected and separate from the newly discovered/added host controller. In reality, 1. and 2. above essentially will cause the same problem but just not as bad considering that the host controller itself remains intact along with it socket bindings and other configuration/history/drift/events/alerts/etc. Only the managed servers get separated.

Comment 1 Larry O'Leary 2014-02-14 01:54:42 UTC
Re-targeted to CP02 to reduce CP01 overall payload.

Comment 2 Libor Zoubek 2014-04-22 14:33:02 UTC
Actual results, are no longer valid. New actual results on 4.10:

Domain controller is being reported as DOWN after it was started with updated name, avail error is reported to user. And because of Bug 1088264 there was no way to recover from avail error (even returning to original host name did not work)

I have a fix in https://github.com/rhq-project/rhq/pull/23

which, fixes above BZ. I did not stick with any of proposed solutions. Instead, AS7 plugin tries to be smart and reads domainHost from host.xml when needed. I deprecated 'domainHost' read-only plugin property and introduced new trait instead. domainHost is held by BaseServerComponent and is detected on component start and if needed in avail code.

Comment 3 Thomas Segismont 2014-04-24 13:09:43 UTC
(In reply to Libor Zoubek from comment #2)
> 
> I have a fix in https://github.com/rhq-project/rhq/pull/23
> 
> which, fixes above BZ. I did not stick with any of proposed solutions.
> Instead, AS7 plugin tries to be smart and reads domainHost from host.xml
> when needed. I deprecated 'domainHost' read-only plugin property and
> introduced new trait instead. domainHost is held by BaseServerComponent and
> is detected on component start and if needed in avail code.

This strategy sounds good to me. I just reviewed the PR (did not test it), I vote for this change.

Comment 4 Heiko W. Rupp 2014-05-21 09:57:57 UTC
merged into master as f9ef978b72ae

Comment 5 Libor Zoubek 2014-05-30 10:48:24 UTC
This https://github.com/rhq-project/rhq/commit/4bba58bca1e7363c1cba2d777e92a07ace9a1d41

should be cherry-picked as well, otherwise agent.log will produce warnings in case there is any standalone EAP which is DOWN on agent.

Comment 7 Simeon Pinder 2014-07-31 15:52:19 UTC
Moving to ON_QA as available to test with brew build of DR01: https://brewweb.devel.redhat.com//buildinfo?buildID=373993

Comment 8 Armine Hovsepyan 2014-08-05 12:27:15 UTC
Created attachment 924193 [details]
before_hostname_change

Comment 9 Armine Hovsepyan 2014-08-05 12:33:07 UTC
Created attachment 924194 [details]
after-hostname-change

Comment 10 Armine Hovsepyan 2014-08-05 12:33:44 UTC
verified in jon 3.3 dr1 
screen-shots attached