Description of problem: If you deploy an application an set the timeout too low, the deployment shows as failed on the RHQ/JON server side because of a timeout, but in actuality the app was deployed (a manual discovery needs to be run to show it in JON, unlike when the deployment succeeds) Because you see deployment failed, user tries to redeploy and it fails stating the directory already exists. Can a better description be given in the help or documentation, explaining this behavior and to check to see if the app was actually deployed before attempting a re-deploy. Version-Release number of selected component (if applicable): 4.2.0
What this issue covers is that essentially, when you deploy content to a resource (such as a war to tomcat) and it the time out is exceeded, RHQ UI reports the deployment as failed (deployment history) but the content was actually successfully deployed. What the user is experiencing is that the description of the time out should include indication that the deployment may have succeeded. Additionally, the timeout value available when deploying content (such as a war to tomcat) needs to describe what the default is and what the timeout should be used for. For example: The default value is 60 seconds. A higher value can be specified here if the content will take longer then 60 seconds to upload to the target server and to fully start.
per BZ triage (asantos, ccrouch, loleary, mfoley)
Lets see if there is scope to address this in 3.1
It sounds like two things are being proposed in comment 1. First, we update the timeout failure message to indicate that it is possible that the deployment may have succeeded. Secondly, in the resource create wizard, we update the timeout info to specify the default value. The note window does already mention that overriding the default is useful for long creations and is usually used if there have been previous timeout failures. Is there anything else we want to do here?
(In reply to comment #5) > ... > Is there anything else we want to do here? This should cover it. The only addition that might be nice is the product documentation around this property and for deployment operations in general should indicate what a "Timeout" actually means when the user sees such a deploy failure. For example, just because it says "Failed" does not mean the deployment failed if a Timeout exception occurred.
Pushed changes to release/jon3.1.x branch to provide better information about the deployment timeout. commit hash: 37163091
If the timeout has been set less than it actually takes, there is no error in RHQ ui, but the deployment is marked as failure in logs. As soon as auto-discovery is done, deployed war is visible with green tick in rhq UI. While creating new deploy child with the same war and same version, no error is being shown on RHQ UI, while it fails on jon side.
JON 3.1.1 ER1 build is available. Moving to ON_QA. https://brewweb.devel.redhat.com/buildinfo?buildID=226942
putting back to ON_Dev, since the comment #8 is "active" for 3.1.1. ER1 version.
The primary motivation for making the changes in this bug is to address exactly what was encountered in comment 8. If the time out setting is too low, it is entirely possible that it will be exceeded, resulting in the request being logged as a failure, even though a subsequent discovery scan will result in the resource showing up in inventory. In addition to the server log, the failure can be seen in the child history view. If the deployment is reported as a failure as a result of a time out, and then if a subsequent deployment fails, this is expected behavior since the resource does in fact already exist in the agent's inventory. I have made a few changes to make things a bit more clear. The timeout help next now notes that in the event of a timeout, the deployment may still succeed and that you may want to execute a discovery scan. On the agent side, we were incorrectly logging a timeout as failure which means the status in the child history view was getting reported as "Failed". I have updated the logic on the agent so that the status is correctly being reported as "Timed Out". And when you go to view the error, the message field is now prefixed with, "The time out has been exceeded; however, the deployment may have been successful. You may want to run a discovery scan to see if the deployment did complete successfully. Also consider using a higher time out value for future deployments." Changes have been pushed to the release/jon3.1.x branch. commit hash: e1c81ff630
The commit cited in comment 11 was a bad merge. I had to revert the commit and redo the merge. The new commit has been pushed to the release/jon3.1.x branch. commit hash: eea3f0285ab5c
Moving to ON_QA. The JON 3.1.1 ER3 build is available at https://brewweb.devel.redhat.com/buildinfo?buildID=230321.
hi John, Thanks a lot for your work. Could you please keep only this message? "The time out has been exceeded; however, the deployment may have been successful. You may want to run a discovery scan to see if the deployment did complete successfully. Also consider using a higher time out value for future deployments. Root Cause: org.rhq.core.pc.inventory.TimeoutException: Call to [org.rhq.plugins.jbossas.JBossASServerComponent.createResource()] with args [[CreateResourceReport: ResourceType=[{JBossAS}Web Application (WAR)], ResourceKey=[null]]] timed out after 1000 milliseconds - invocation thread will be interrupted." Thanks in advance.
After some discussion it has been decied to *not* display the entire stack trace in the resource creation error message. Instead we will display the new, detailed error message along with the more brief exception message that reports the plugin component method in which the time out occurred. The full stack trace can still however be obtained from the agent logs with DEBUG logging enabled. Since this is a known condition that we can and do handle, there is no need to overwhelm the user with an entire stack trace when we can provide a concise, yet detailed error message. release/jon3.1.x commit hash: eca29122a
The CR1 build is available at https://brewweb.devel.redhat.com/buildinfo?buildID=231258. Moving to ON_QA.
thanks a lot John. verified!
Bulk closing of old issues in VERIFIED state.