Bug 802796

Summary: Request more information on Deployment timeout setting
Product: [Other] RHQ Project Reporter: dsteigne
Component: Core ServerAssignee: John Sanda <jsanda>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: high    
Version: 4.2CC: ahovsepy, hrupp, jsanda, loleary
Target Milestone: ---   
Target Release: JON 3.1.1   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=812452
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 826774 (view as bug list) Environment:
Last Closed: 2013-09-03 11:09:40 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 735475, 826774    

Description dsteigne 2012-03-13 10:21:00 EDT
Description of problem:
If you deploy an application an set the timeout too low, the deployment shows as failed on the RHQ/JON server side because of a timeout, but in actuality the app was deployed (a manual discovery needs to be run to show it in JON, unlike when the deployment succeeds)  Because you see deployment failed, user tries to redeploy and it fails stating the directory already exists.

Can a better description be given in the help or documentation, explaining this behavior and to check to see if the app was actually deployed before attempting a re-deploy.

Version-Release number of selected component (if applicable):
Comment 1 Larry O'Leary 2012-03-19 12:01:25 EDT
What this issue covers is that essentially, when you deploy content to a resource (such as a war to tomcat) and it the time out is exceeded, RHQ UI reports the deployment as failed (deployment history) but the content was actually successfully deployed. What the user is experiencing is that the description of the time out should include indication that the deployment may have succeeded.

Additionally, the timeout value available when deploying content (such as a war to tomcat) needs to describe what the default is and what the timeout should be used for. For example: The default value is 60 seconds. A higher value can be specified here if the content will take longer then 60 seconds to upload to the target server and to fully start.
Comment 2 Mike Foley 2012-03-19 12:19:02 EDT
per BZ triage (asantos, ccrouch, loleary, mfoley)
Comment 3 Charles Crouch 2012-04-16 10:04:24 EDT
Lets see if there is scope to address this in 3.1
Comment 5 John Sanda 2012-05-30 13:09:48 EDT
It sounds like two things are being proposed in comment 1. First, we update the timeout failure message to indicate that it is possible that the deployment may have succeeded. Secondly, in the resource create wizard, we update the timeout info to specify the default value. The note window does already mention that overriding the default is useful for long creations and is usually used if there have been previous timeout failures.

Is there anything else we want to do here?
Comment 6 Larry O'Leary 2012-05-30 13:29:24 EDT
(In reply to comment #5)
> ... 
> Is there anything else we want to do here?

This should cover it. The only addition that might be nice is the product documentation around this property and for deployment operations in general should indicate what a "Timeout" actually means when the user sees such a deploy failure. For example, just because it says "Failed" does not mean the deployment failed if a Timeout exception occurred.
Comment 7 John Sanda 2012-07-20 10:26:58 EDT
Pushed changes to release/jon3.1.x branch to provide better information about the deployment timeout.

commit hash: 37163091
Comment 8 Armine Hovsepyan 2012-08-01 08:30:15 EDT
If the timeout has been set less than it actually takes, there is no error in RHQ ui, but the deployment is marked as failure in logs. As soon as auto-discovery is done, deployed war is visible with green tick in rhq UI.

While creating new deploy child with the same war and same version, no error is being shown on RHQ UI, while it fails on jon  side.
Comment 9 John Sanda 2012-08-02 14:52:15 EDT
JON 3.1.1 ER1 build is available. Moving to ON_QA.

Comment 10 Armine Hovsepyan 2012-08-06 04:25:48 EDT
putting back to ON_Dev, since the comment #8 is "active" for 3.1.1. ER1 version.
Comment 11 John Sanda 2012-08-15 13:50:41 EDT
The primary motivation for making the changes in this bug is to address exactly what was encountered in comment 8. If the time out setting is too low, it is entirely possible that it will be exceeded, resulting in the request being logged as a failure, even though a subsequent discovery scan will result in the resource showing up in inventory. In addition to the server log, the failure can be seen in the child history view.

If the deployment is reported as a failure as a result of a time out, and then if a subsequent deployment fails, this is expected behavior since the resource does in fact already exist in the agent's inventory.

I have made a few changes to make things a bit more clear. The timeout help next now notes that in the event of a timeout, the deployment may still succeed and that you may want to execute a discovery scan. On the agent side, we were incorrectly logging a timeout as failure which means the status in the child history view was getting reported as "Failed". I have updated the logic on the agent so that the status is correctly being reported as "Timed Out". And when you go to view the error, the message field is now prefixed with,

"The time out has been exceeded; however, the deployment may have been successful. You may want to run a discovery scan to see if the deployment did complete successfully. Also consider using a higher time out value for future deployments."

Changes have been pushed to the release/jon3.1.x branch.

commit hash: e1c81ff630
Comment 12 John Sanda 2012-08-16 11:33:25 EDT
The commit cited in comment 11 was a bad merge. I had to revert the commit and redo the merge. The new commit has been pushed to the release/jon3.1.x branch.

commit hash:  eea3f0285ab5c
Comment 13 John Sanda 2012-08-22 01:39:37 EDT
Moving to ON_QA. The JON 3.1.1 ER3 build is available at https://brewweb.devel.redhat.com/buildinfo?buildID=230321.
Comment 14 Armine Hovsepyan 2012-08-22 10:46:54 EDT
hi John,

Thanks a lot for your work. Could you please keep only this message?

"The time out has been exceeded; however, the deployment may have been successful. You may want to run a discovery scan to see if the deployment did complete successfully. Also consider using a higher time out value for future deployments.

Root Cause:
org.rhq.core.pc.inventory.TimeoutException: Call to [org.rhq.plugins.jbossas.JBossASServerComponent.createResource()] with args [[CreateResourceReport: ResourceType=[{JBossAS}Web Application (WAR)], ResourceKey=[null]]] timed out after 1000 milliseconds - invocation thread will be interrupted."

Thanks in advance.
Comment 15 John Sanda 2012-08-22 16:09:28 EDT
After some discussion it has been decied to *not* display the entire stack trace in the resource creation error message. Instead we will display the new, detailed error message along with the more brief exception message that reports the plugin component method in which the time out occurred. The full stack trace can still however be obtained from the agent logs with DEBUG logging enabled. Since this is a  known condition that we can and do handle, there is no need to overwhelm the user with an entire stack trace when we can provide a concise, yet detailed error message.

release/jon3.1.x commit hash: eca29122a
Comment 16 John Sanda 2012-08-29 21:52:06 EDT
The CR1 build is available at
https://brewweb.devel.redhat.com/buildinfo?buildID=231258. Moving to ON_QA.
Comment 17 Armine Hovsepyan 2012-08-30 05:28:19 EDT
thanks a lot John.

Comment 18 Heiko W. Rupp 2013-09-03 11:09:40 EDT
Bulk closing of old issues in VERIFIED state.