When a user attempted to deploy content to EAP 6 standalone using a provisioning bundle, deployment failed if the agent was restarted since the last time the deployDir trait was retrieved from the managed resource, and the managed resource had since been shut down or became unavailable for any other reason. This resulted in bundle deployment being inconsistent. Manual intervention was required to verify the deployment was actually successful. The most recent value of the trait is now retrieved from the server instead of cached trait values. Deployment no longer fails with bundle deployments.
Description of problem:
When attempting to deploy content to EAP 6 standalone using a provisioning bundle, deployment fails if the agent has been restarted since the last time the deployDir trait was retrieved from the managed resource and the managed resource has sense been shutdown or becomes unavailable for any other reason.
This means that deployment of bundles is inconsistent and require manual intervention to re-validate that the deployment was actually successful.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start EAP 6.0.1 standalone server.
2. Start JBoss ON 3.1.1 system.
3. Import EAP into inventory.
4. Verify EAP is reported as available.
5. Create new test bundle version.
6. Deploy test bundle to EAP 6 resource.
* *Destination Name*: Test 01
* *Resource Group*: All EAP 6 Standalone Servers
* *Deployment Directory*: my-useless-app-01
7. Shutdown EAP server.
8. Restart JON agent
9. Deploy test bundle to a second destination in EAP 6 resource.
10. Deploy test bundle to EAP 6 resource.
* *Destination Name*: Test 02
* *Resource Group*: All EAP 6 Standalone Servers
* *Deployment Directory*: my-useless-app-02
my-useless-app-02 will not appear in the deployments directory and the agent log will contain the following errors:
ERROR [ResourceContainer.invoker.daemon-2] (rhq.modules.plugins.jbossas7.BaseServerComponent)- No default deployment scanner was found, returning no value
ERROR [BundleDeployment-1] (rhq.core.pc.bundle.BundleManager)- Failed to complete bundle deployment
java.lang.IllegalArgumentException: Cannot obtain trait [Deploy Directory] for resource [EAP (127.0.0.1:9990)]
my-useless-app-02 should appear in EAP's standalone/deployments directory and no errors should appear in the agent log.
With EAP 4 and EAP 5, the plug-in used the pluginConfiguration deployment directory. This meant that once a resource is added to inventory, the value remains available and does not rely on metric collection. With the EAP 6 plug-in, for some reason, we decided to use the deployDir trait instead of the resource configuration content-dir property value. The result is that the EAP server must be running and available for a bundle to be deployed to the EAP server's file system.
After a closer look, the content-dir we use in the standalone resource configuration is not what we describe it as. It is where EAP itself manages its runtime/deployed content, not user content.
From what I can tell, to address this issue, we need to introduce a pluginConfiguration option that represents a deployment-scanner's path value.
Proposed fix in https://git.fedorahosted.org/cgit/rhq/rhq.git/diff/?h=bug/955816.
The only thing that it is missing is upgrade logic that will set the deploymentScannerPath pluginConfig value for any AS7 Standalone resource already in inventory that has a value of null. This is probably pretty important considering that this change means that existing users of the plug-in might already be deploying bundles and this change would result in their existing EAP resources containing an empty value.
After further discussion a clear fix has not yet been determined but here are the proposals:
1) Based off what I originally did in the bug branch in commit 03f4dc7edbc4aa701ef029d5e18267f494ea9215 with the following modifications:
The plugin configuration property be changed to "Bundle Deploy Directory" or "Deploy Directory" or "Deployment Directory". This is to essentially separate it from the deployment-scanner itself. Its value will essentially become the value used by the bundle deployer.
The user will be responsible for making certain that this value points to a location/path that the deployment scanner is using.
On initial resource discovery for an EAP standalone server, this value will default to that of deployment-scanner path if it is available.
Finally, to handle resource upgrade and to allow users to continue to use the latest value for the deployDir trait, a value of <null> for the "Deploy Directory" plugin configuration property will result in the trait being used. At the moment, this is not possible as configuration is read in a very abstract way. If we could somehow instruct BundleManager.getAbsoluteDestinationDir to fall-back to using 'measurementTrait' with the trait name of deployDir if 'pluginConfiguration' using the property name deployDir returns <null>.
2) We could move this into resource configuration. Either at the EAP standalone server resource or the deployment-scanner resource. This would allow the value currently configured for deployment-scanner to be used.
Of course there are some problems with this approach. First bundles are deployed at the EAP standalone server resource level and not the deployment-scanner level. What this means is that we would need a way to tell BundleManager.getAbsoluteDestinationDir to locate a child resource by type and read its resourceConfiguration property value.
If we don't want to go down the road of traversing child and grand-child resource configuration, then we would need to expose a deploymentScannerPath resource configuration property on the standalone server resource. This would be strange seeing that this resource property doesn't actually belong to the resource it would be defined on. Additionally, if the user updates the value, is the expectation that the value be pushed down to the deployment-scanner? And if multiple deployment-scanners are deployed, what then?
If I understand the issue correctly, the problem we have with "deployDir" being a trait is that it can:
a) contain stale data (if the managed resource is reconfigured between the collection intervals outside of RHQ)
b) Cannot be used if the managed resource is down and agent didn't have a chance to collect a value of that metric.
There are a couple of possible solutions to this, none of them without problems, IMHO:
1) add a plugin config prop with equivalent meaning (this is what Larry's patch does). This solves both a) and b) by requiring the user to provide correct value manually each time the deploy dir is changed. Note changing the value in the plugin config has to be done in *addition* to reconfiguring the managed resource itself, because plugin configuration is not meant to have an effect on the resource - it serves the purpose of "connection properties".
2) Expose deployDir as a readonly resource config property. This solves b) because the resource configuration is cached on the agent (and initially obtained from the server). This does NOT solve a) though because the refreshes of the resource config are subject to configuration scan periods (note that as far as I looked, loading live resource config by requesting it from the server DOES NOT update the cached version on the agent, which I think could be considered a bug - but I need to do more investigation on that). But even if loading live config would refresh the cached version, a) could still occur.
3) Always ask for live metric values and modify server-agent comms to pass the initial cached values of all metrics. This would solve a) by always asking for the live value of the metric and would solve b) by always having the last known value to work with if collecting the metric was not possible. Obviously this is somewhat larger scale change and has a considerable 1-time traffic hit at the agent start.
4) deployDir as readonly resource config + always read the live value. This is easier to implement (a simple change to the BundleManager) but has a performance impact on the resource components, because config load might be quite heavy. But because deployments are not a frequent operation on resources, I'd consider that negligible.
I am actually in favor of doing both 3) and 4) (3 has some impact but given how infrequently we collect traits, the time-window for exposure of the above error is quite big).
A note on Larry's (In reply to comment #3)
... snip ...
> What this means is that we would need a way to
> tell BundleManager.getAbsoluteDestinationDir to locate a child resource by
> type and read its resourceConfiguration property value.
This is a intriguing proposal, I wonder how would that work:
The expression for a single deployment scanner path could look like:
child("DeploymentScanner") means "on child resources of type 'DeploymentScanner'"
resourceConfiguration means to look in resource configuration of the child resource
"*" is the actual name of the list of the scanners, i.e. not a wildcard
scanner is the name of the scanner map
path is the attribute containing the value in the scanner map
The selection in the UI would then actually be a multi-step process where the user would need to resolve all of the "path elements" that can potentially represent multiple "things". I.e. the user would need to select the deployment scanner resource (possibly this could be skipped because it is a singleton) and would need to select the concrete "scanner" map element in the list of scanners.
The remote API for this is going to be more interesting though, because we somehow need to identify the individual elements in the above expression. The easiest way to do it would possibly by passing an array of strings, where on the 0th index, there would be a resource id of the deployment scanner and on the first index the index of the list element. This is very untyped, though given the freeform nature of the expression, I can't see many ways of providing more typesafe approach.
A simpler fix proposed in https://github.com/rhq-project/rhq/pull/116.
Merged in release/jon3.3.x
Author: Thomas Segismont <firstname.lastname@example.org>
Date: Wed Sep 10 21:30:53 2014 +0200
Bug 955816 - Bundle deployment to EAP 6 standalone server fails due to deployDir trait not being available
From patch file created with "git diff b6795f8 1aa9fe9"
Moving to ON_QA as available for test with the following brew build:
it works, I was trying wrong including in bundle the handover tasks related which requiring EAP6 to be up. Otherwise a standard archive extract operation inside the tarball happened as well as creating the correct deployment directory provided relatively to the deployDir while the EAP itself was on down state.
15:30:06,635 INFO [SystemInfoManager] (http-/0.0.0.0:7080-2) SystemInformation: ********
AS config dir: [/home/hudson/jon-server-3.3.0.ER03/jbossas/standalone/configuration]
AS product name: [EAP]
AS product version: [6.3.0.GA]
AS version: [7.4.0.Final-redhat-19]
Agent cloud-qe-1-vm-1.idmqe.lab.eng.bos.redhat.com: [Agent[id=10002,name=cloud-qe-1-vm-1.idmqe.lab.eng.bos.redhat.com,address=10.16.96.146,port=16163,remote-endpoint=socket://10.16.96.146:16163/?rhq.communications.connector.rhqtype=agent&numAcceptThreads=1&maxPoolSize=303&clientMaxPoolSize=304&socketTimeout=60000&enableTcpNoDelay=true&backlog=200,last-availability-ping=1411046997472,last-availability-report=1411046517457]]
Agent ibm-x3550m3-11.lab.eng.brq.redhat.com: [Agent[id=10001,name=ibm-x3550m3-11.lab.eng.brq.redhat.com,address=10.34.36.137,port=16163,remote-endpoint=socket://10.34.36.137:16163/?rhq.communications.connector.rhqtype=agent&numAcceptThreads=1&maxPoolSize=303&clientMaxPoolSize=304&socketTimeout=60000&enableTcpNoDelay=true&backlog=200,last-availability-ping=1411046999300,last-availability-report=1411046876571]]
CAM_HELP_PASSWORD: [- non null -]
CAM_LDAP_BIND_PW: [- non null -]
DATABASE_DRIVER_NAME: [PostgreSQL Native Driver]
DATABASE_DRIVER_VERSION: [PostgreSQL 9.2 JDBC4 (build 1002)]
FullName: [JBoss Operations Network]
Name: [JBoss ON]
SERVER_LOCAL_TIME: [September 18, 2014 3:30:06 PM CEST]
SERVER_TIMEZONE: [Central European Time]
Storage_Node ibm-x3550m3-11.lab.eng.brq.redhat.com: [storageNode.addresss=ibm-x3550m3-11.lab.eng.brq.redhat.com, hostname=ibm-x3550m3-11.lab.eng.brq.redhat.com, beginTime=1411018206587, beginTime=1411018206587, unackAlerts=0, heapUsed=null, heapPercentageUsed=Min: 0.050766182232332875, Max: 0.2918473015920365, Avg: 0.16261281570793695 (%), load=null, dataUsedPercentage=null, dataDiskUsed=null, tokens=null, actuallyOwns=null]