Bug 955816 - Bundle deployment to EAP 6 standalone server fails due to deployDir trait not being available
Summary: Bundle deployment to EAP 6 standalone server fails due to deployDir trait not...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Plugin -- JBoss EAP 6
Version: JON 3.1.1
Hardware: All
OS: All
unspecified
high
Target Milestone: ER03
: JON 3.3.0
Assignee: Lukas Krejci
QA Contact: Garik Khachikyan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-23 22:49 UTC by Larry O'Leary
Modified: 2018-12-03 18:45 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When a user attempted to deploy content to EAP 6 standalone using a provisioning bundle, deployment failed if the agent was restarted since the last time the deployDir trait was retrieved from the managed resource, and the managed resource had since been shut down or became unavailable for any other reason. This resulted in bundle deployment being inconsistent. Manual intervention was required to verify the deployment was actually successful. The most recent value of the trait is now retrieved from the server instead of cached trait values. Deployment no longer fails with bundle deployments.
Clone Of:
Environment:
Last Closed: 2014-12-11 14:01:41 UTC
Type: Enhancement
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 957282 0 high CLOSED [as7] Add more bundle support for standalone mode 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 356283 0 None None None Never

Internal Links: 957282

Description Larry O'Leary 2013-04-23 22:49:01 UTC
Description of problem:
When attempting to deploy content to EAP 6 standalone using a provisioning bundle, deployment fails if the agent has been restarted since the last time the deployDir trait was retrieved from the managed resource and the managed resource has sense been shutdown or becomes unavailable for any other reason.

This means that deployment of bundles is inconsistent and require manual intervention to re-validate that the deployment was actually successful. 

Version-Release number of selected component (if applicable):
4.4.0.JON311GA

How reproducible:
Always

Steps to Reproduce:
1.  Start EAP 6.0.1 standalone server.
2.  Start JBoss ON 3.1.1 system.
3.  Import EAP into inventory.
4.  Verify EAP is reported as available.
5.  Create new test bundle version.
6.  Deploy test bundle to EAP 6 resource.

    *   *Destination Name*: Test 01
    *   *Resource Group*: All EAP 6 Standalone Servers
    *   *Deployment Directory*: my-useless-app-01

7.  Shutdown EAP server.
8.  Restart JON agent
9.  Deploy test bundle to a second destination in EAP 6 resource.
10. Deploy test bundle to EAP 6 resource.

    *   *Destination Name*: Test 02
    *   *Resource Group*: All EAP 6 Standalone Servers
    *   *Deployment Directory*: my-useless-app-02
  
Actual results:
my-useless-app-02 will not appear in the deployments directory and the agent log will contain the following errors:

ERROR [ResourceContainer.invoker.daemon-2] (rhq.modules.plugins.jbossas7.BaseServerComponent)- No default deployment scanner was found, returning no value
ERROR [BundleDeployment-1] (rhq.core.pc.bundle.BundleManager)- Failed to complete bundle deployment
java.lang.IllegalArgumentException: Cannot obtain trait [Deploy Directory] for resource [EAP (127.0.0.1:9990)]
	at org.rhq.core.pc.bundle.BundleManager.getAbsoluteDestinationDir(BundleManager.java:534)
	at org.rhq.core.pc.bundle.BundleManager.access$200(BundleManager.java:85)
	at org.rhq.core.pc.bundle.BundleManager$1.run(BundleManager.java:194)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:636)


Expected results:
my-useless-app-02 should appear in EAP's standalone/deployments directory and no errors should appear in the agent log.

Additional info:
With EAP 4 and EAP 5, the plug-in used the pluginConfiguration deployment directory. This meant that once a resource is added to inventory, the value remains available and does not rely on metric collection. With the EAP 6 plug-in, for some reason, we decided to use the deployDir trait instead of the resource configuration content-dir property value. The result is that the EAP server must be running and available for a bundle to be deployed to the EAP server's file system.

Comment 1 Larry O'Leary 2013-04-23 23:09:00 UTC
After a closer look, the content-dir we use in the standalone resource configuration is not what we describe it as. It is where EAP itself manages its runtime/deployed content, not user content.

From what I can tell, to address this issue, we need to introduce a pluginConfiguration option that represents a deployment-scanner's path value.

   /subsystem=deployment-scanner/scanner=default:read-attribute(name=path)

Comment 2 Larry O'Leary 2013-04-25 02:28:03 UTC
Proposed fix in https://git.fedorahosted.org/cgit/rhq/rhq.git/diff/?h=bug/955816.

The only thing that it is missing is upgrade logic that will set the deploymentScannerPath pluginConfig value for any AS7 Standalone resource already in inventory that has a value of null. This is probably pretty important considering that this change means that existing users of the plug-in might already be deploying bundles and this change would result in their existing EAP resources containing an empty value.

Comment 3 Larry O'Leary 2013-04-26 21:04:07 UTC
After further discussion a clear fix has not yet been determined but here are the proposals:

1) Based off what I originally did in the bug branch in commit 03f4dc7edbc4aa701ef029d5e18267f494ea9215 with the following modifications:

The plugin configuration property be changed to "Bundle Deploy Directory" or "Deploy Directory" or "Deployment Directory". This is to essentially separate it from the deployment-scanner itself. Its value will essentially become the value used by the bundle deployer.

The user will be responsible for making certain that this value points to a location/path that the deployment scanner is using. 
 
On initial resource discovery for an EAP standalone server, this value will default to that of deployment-scanner path if it is available.
 
Finally, to handle resource upgrade and to allow users to continue to use the latest value for the deployDir trait, a value of <null> for the "Deploy Directory" plugin configuration property will result in the trait being used. At the moment, this is not possible as configuration is read in a very abstract way. If we could somehow instruct BundleManager.getAbsoluteDestinationDir to fall-back to using 'measurementTrait' with the trait name of deployDir if 'pluginConfiguration' using the property name deployDir returns <null>.

     
2) We could move this into resource configuration. Either at the EAP standalone server resource or the deployment-scanner resource. This would allow the value currently configured for deployment-scanner to be used.

Of course there are some problems with this approach. First bundles are deployed at the EAP standalone server resource level and not the deployment-scanner level. What this means is that we would need a way to tell BundleManager.getAbsoluteDestinationDir to locate a child resource by type and read its resourceConfiguration property value.

If we don't want to go down the road of traversing child and grand-child resource configuration, then we would need to expose a deploymentScannerPath resource configuration property on the standalone server resource. This would be strange seeing that this resource property doesn't actually belong to the resource it would be defined on. Additionally, if the user updates the value, is the expectation that the value be pushed down to the deployment-scanner? And if multiple deployment-scanners are deployed, what then?

Comment 4 Lukas Krejci 2013-04-29 08:55:16 UTC
If I understand the issue correctly, the problem we have with "deployDir" being a trait is that it can:
a) contain stale data (if the managed resource is reconfigured between the collection intervals outside of RHQ)
b) Cannot be used if the managed resource is down and agent didn't have a chance to collect a value of that metric.

There are a couple of possible solutions to this, none of them without problems, IMHO:
1) add a plugin config prop with equivalent meaning (this is what Larry's patch does). This solves both a) and b) by requiring the user to provide correct value manually each time the deploy dir is changed. Note changing the value in the plugin config has to be done in *addition* to reconfiguring the managed resource itself, because plugin configuration is not meant to have an effect on the resource - it serves the purpose of "connection properties".

2) Expose deployDir as a readonly resource config property. This solves b) because the resource configuration is cached on the agent (and initially obtained from the server). This does NOT solve a) though because the refreshes of the resource config are subject to configuration scan periods (note that as far as I looked, loading live resource config by requesting it from the server DOES NOT update the cached version on the agent, which I think could be considered a bug - but I need to do more investigation on that). But even if loading live config would refresh the cached version, a) could still occur.

3) Always ask for live metric values and modify server-agent comms to pass the initial cached values of all metrics. This would solve a) by always asking for the live value of the metric and would solve b) by always having the last known value to work with if collecting the metric was not possible. Obviously this is somewhat larger scale change and has a considerable 1-time traffic hit at the agent start.

4) deployDir as readonly resource config + always read the live value. This is easier to implement (a simple change to the BundleManager) but has a performance impact on the resource components, because config load might be quite heavy. But because deployments are not a frequent operation on resources, I'd consider that negligible.


I am actually in favor of doing both 3) and 4) (3 has some impact but given how infrequently we collect traits, the time-window for exposure of the above error is quite big).

Comment 5 Lukas Krejci 2013-04-29 09:18:23 UTC
A note on Larry's (In reply to comment #3)
... snip ...
> What this means is that we would need a way to
> tell BundleManager.getAbsoluteDestinationDir to locate a child resource by
> type and read its resourceConfiguration property value.

This is a intriguing proposal, I wonder how would that work:

The expression for a single deployment scanner path could look like:
${child("DeploymentScanner")/resourceConfiguration/*/scanner/path}

where:
child("DeploymentScanner") means "on child resources of type 'DeploymentScanner'"
resourceConfiguration means to look in resource configuration of the child resource
"*" is the actual name of the list of the scanners, i.e. not a wildcard
scanner is the name of the scanner map
path is the attribute containing the value in the scanner map

The selection in the UI would then actually be a multi-step process where the user would need to resolve all of the "path elements" that can potentially represent multiple "things". I.e. the user would need to select the deployment scanner resource (possibly this could be skipped because it is a singleton) and would need to select the concrete "scanner" map element in the list of scanners.

The remote API for this is going to be more interesting though, because we somehow need to identify the individual elements in the above expression. The easiest way to do it would possibly by passing an array of strings, where on the 0th index, there would be a resource id of the deployment scanner and on the first index the index of the list element. This is very untyped, though given the freeform nature of the expression, I can't see many ways of providing more typesafe approach.

Comment 7 Lukas Krejci 2014-08-29 20:21:37 UTC
A simpler fix proposed in https://github.com/rhq-project/rhq/pull/116.

Comment 8 Thomas Segismont 2014-09-10 19:48:54 UTC
Merged in release/jon3.3.x

commit 11cb1133125ae227e3bcac127ca0e67a38b73528
Author: Thomas Segismont <tsegismo>
Date:   Wed Sep 10 21:30:53 2014 +0200

    Bug 955816 - Bundle deployment to EAP 6 standalone server fails due to deployDir trait not being available
    
    From patch file created with "git diff b6795f8 1aa9fe9"

Comment 9 Simeon Pinder 2014-09-17 02:49:22 UTC
Moving to ON_QA as available for test with the following brew build:
https://brewweb.devel.redhat.com//buildinfo?buildID=385149

Comment 16 Garik Khachikyan 2014-09-18 13:30:32 UTC
# VERIFIED

it works, I was trying wrong including in bundle the handover tasks related which requiring EAP6 to be up. Otherwise a standard archive extract operation inside the tarball happened as well as creating the correct deployment directory provided relatively to the deployDir while the EAP itself was on down state.

version
===
15:30:06,635 INFO  [SystemInfoManager] (http-/0.0.0.0:7080-2) SystemInformation: ********
ACTIVE_DRIFT_PLUGIN: [drift-jpa]
AGENT_MAX_QUIET_TIME_ALLOWED: [300000]
ALERT_PURGE: [2678400000]
AS config dir: [/home/hudson/jon-server-3.3.0.ER03/jbossas/standalone/configuration]
AS product name: [EAP]
AS product version: [6.3.0.GA]
AS version: [7.4.0.Final-redhat-19]
AVAILABILITY_PURGE: [31536000000]
Agent cloud-qe-1-vm-1.idmqe.lab.eng.bos.redhat.com: [Agent[id=10002,name=cloud-qe-1-vm-1.idmqe.lab.eng.bos.redhat.com,address=10.16.96.146,port=16163,remote-endpoint=socket://10.16.96.146:16163/?rhq.communications.connector.rhqtype=agent&numAcceptThreads=1&maxPoolSize=303&clientMaxPoolSize=304&socketTimeout=60000&enableTcpNoDelay=true&backlog=200,last-availability-ping=1411046997472,last-availability-report=1411046517457]]
Agent ibm-x3550m3-11.lab.eng.brq.redhat.com: [Agent[id=10001,name=ibm-x3550m3-11.lab.eng.brq.redhat.com,address=10.34.36.137,port=16163,remote-endpoint=socket://10.34.36.137:16163/?rhq.communications.connector.rhqtype=agent&numAcceptThreads=1&maxPoolSize=303&clientMaxPoolSize=304&socketTimeout=60000&enableTcpNoDelay=true&backlog=200,last-availability-ping=1411046999300,last-availability-report=1411046876571]]
AlertCount: [0]
AlertDefinitionCount: [8]
BuildNumber: [4aefe39:44e33a4]
CAM_BASELINE_DATASET: [604800000]
CAM_BASELINE_FREQUENCY: [259200000]
CAM_BASE_URL: [http://10.34.36.137:7080/]
CAM_DATA_MAINTENANCE: [3600000]
CAM_DATA_PURGE_1D: [31536000000]
CAM_DATA_PURGE_1H: [1209600000]
CAM_DATA_PURGE_6H: [2678400000]
CAM_GUIDE_ENABLED: [true]
CAM_HELP_PASSWORD: [- non null -]
CAM_HELP_USER: [web]
CAM_JAAS_PROVIDER: [false]
CAM_LDAP_BASE_DN: [o=JBoss,c=US]
CAM_LDAP_BIND_DN: []
CAM_LDAP_BIND_PW: [- non null -]
CAM_LDAP_FILTER: []
CAM_LDAP_FOLLOW_REFERRALS: [false]
CAM_LDAP_LOGIN_PROPERTY: [cn]
CAM_LDAP_NAMING_FACTORY_INITIAL: [com.sun.jndi.ldap.LdapCtxFactory]
CAM_LDAP_NAMING_PROVIDER_URL: [ldap://localhost/]
CAM_LDAP_PROTOCOL: [false]
CAM_RT_COLLECT_IP_ADDRS: [true]
CAM_SYSLOG_ACTIONS_ENABLED: [false]
DATABASE_CONNECTION_URL: [jdbc:postgresql://127.0.0.1:5432/rhq?loginTimeout=0&socketTimeout=0&prepareThreshold=5&unknownLength=2147483647&loglevel=0&tcpkeepalive=false&binaryTransfer=true]
DATABASE_DRIVER_NAME: [PostgreSQL Native Driver]
DATABASE_DRIVER_VERSION: [PostgreSQL 9.2 JDBC4 (build 1002)]
DATABASE_PRODUCT_NAME: [PostgreSQL]
DATABASE_PRODUCT_VERSION: [8.4.18]
DATA_REINDEX_NIGHTLY: [false]
DB_SCHEMA_VERSION: [2.160]
DRIFT_FILE_PURGE: [2678400000]
ENABLE_AGENT_AUTO_UPDATE: [true]
ENABLE_LOGIN_WITHOUT_ROLES: [false]
EVENT_PURGE: [1209600000]
FullName: [JBoss Operations Network]
Name: [JBoss ON]
OPERATION_HISTORY_PURGE: [0]
PlatformCount: [2]
RESOURCE_GENERIC_PROPERTIES_UPGRADE: [false]
RHQ_SESSION_TIMEOUT: [3600000]
RT_DATA_PURGE: [2678400000]
SERVER_HOME_DIR: [/home/hudson/jon-server-3.3.0.ER03/jbossas/standalone]
SERVER_IDENTITY: [ibm-x3550m3-11.lab.eng.brq.redhat.com]
SERVER_INSTALL_DIR: [/home/hudson/jon-server-3.3.0.ER03]
SERVER_LOCAL_TIME: [September 18, 2014 3:30:06 PM CEST]
SERVER_TIMEZONE: [Central European Time]
SERVER_VERSION: [4.12.0.JON330ER03]
SchedulesPerMinute: [192]
ServerCount: [9]
ServiceCount: [1024]
Storage_Node ibm-x3550m3-11.lab.eng.brq.redhat.com: [storageNode.addresss=ibm-x3550m3-11.lab.eng.brq.redhat.com, hostname=ibm-x3550m3-11.lab.eng.brq.redhat.com, beginTime=1411018206587, beginTime=1411018206587, unackAlerts=0, heapUsed=null, heapPercentageUsed=Min: 0.050766182232332875, Max: 0.2918473015920365, Avg: 0.16261281570793695 (%), load=null, dataUsedPercentage=null, dataDiskUsed=null, tokens=null, actuallyOwns=null]
TRAIT_PURGE: [31536000000]
Version: [3.3.0.ER03]
********


Note You need to log in before you can comment on or make changes to this bug.