Bug 802796 - Request more information on Deployment timeout setting
Summary: Request more information on Deployment timeout setting
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server
Version: 4.2
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: JON 3.1.1
Assignee: John Sanda
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: rhq-uxd 826774
TreeView+ depends on / blocked
 
Reported: 2012-03-13 14:21 UTC by dsteigne
Modified: 2018-11-26 17:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 826774 (view as bug list)
Environment:
Last Closed: 2013-09-03 15:09:40 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 812452 0 high CLOSED [eap6] timeout issues - deployment 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1381640 0 high CLOSED DomainDeployment content update fails due to upload or deploy timeouts 2021-02-22 00:41:40 UTC

Internal Links: 812452 1381640

Description dsteigne 2012-03-13 14:21:00 UTC
Description of problem:
If you deploy an application an set the timeout too low, the deployment shows as failed on the RHQ/JON server side because of a timeout, but in actuality the app was deployed (a manual discovery needs to be run to show it in JON, unlike when the deployment succeeds)  Because you see deployment failed, user tries to redeploy and it fails stating the directory already exists.

Can a better description be given in the help or documentation, explaining this behavior and to check to see if the app was actually deployed before attempting a re-deploy.

Version-Release number of selected component (if applicable):
4.2.0

Comment 1 Larry O'Leary 2012-03-19 16:01:25 UTC
What this issue covers is that essentially, when you deploy content to a resource (such as a war to tomcat) and it the time out is exceeded, RHQ UI reports the deployment as failed (deployment history) but the content was actually successfully deployed. What the user is experiencing is that the description of the time out should include indication that the deployment may have succeeded.

Additionally, the timeout value available when deploying content (such as a war to tomcat) needs to describe what the default is and what the timeout should be used for. For example: The default value is 60 seconds. A higher value can be specified here if the content will take longer then 60 seconds to upload to the target server and to fully start.

Comment 2 Mike Foley 2012-03-19 16:19:02 UTC
per BZ triage (asantos, ccrouch, loleary, mfoley)

Comment 3 Charles Crouch 2012-04-16 14:04:24 UTC
Lets see if there is scope to address this in 3.1

Comment 5 John Sanda 2012-05-30 17:09:48 UTC
It sounds like two things are being proposed in comment 1. First, we update the timeout failure message to indicate that it is possible that the deployment may have succeeded. Secondly, in the resource create wizard, we update the timeout info to specify the default value. The note window does already mention that overriding the default is useful for long creations and is usually used if there have been previous timeout failures.

Is there anything else we want to do here?

Comment 6 Larry O'Leary 2012-05-30 17:29:24 UTC
(In reply to comment #5)
> ... 
> Is there anything else we want to do here?

This should cover it. The only addition that might be nice is the product documentation around this property and for deployment operations in general should indicate what a "Timeout" actually means when the user sees such a deploy failure. For example, just because it says "Failed" does not mean the deployment failed if a Timeout exception occurred.

Comment 7 John Sanda 2012-07-20 14:26:58 UTC
Pushed changes to release/jon3.1.x branch to provide better information about the deployment timeout.

commit hash: 37163091

Comment 8 Armine Hovsepyan 2012-08-01 12:30:15 UTC
If the timeout has been set less than it actually takes, there is no error in RHQ ui, but the deployment is marked as failure in logs. As soon as auto-discovery is done, deployed war is visible with green tick in rhq UI.

While creating new deploy child with the same war and same version, no error is being shown on RHQ UI, while it fails on jon  side.

Comment 9 John Sanda 2012-08-02 18:52:15 UTC
JON 3.1.1 ER1 build is available. Moving to ON_QA.

https://brewweb.devel.redhat.com/buildinfo?buildID=226942

Comment 10 Armine Hovsepyan 2012-08-06 08:25:48 UTC
putting back to ON_Dev, since the comment #8 is "active" for 3.1.1. ER1 version.

Comment 11 John Sanda 2012-08-15 17:50:41 UTC
The primary motivation for making the changes in this bug is to address exactly what was encountered in comment 8. If the time out setting is too low, it is entirely possible that it will be exceeded, resulting in the request being logged as a failure, even though a subsequent discovery scan will result in the resource showing up in inventory. In addition to the server log, the failure can be seen in the child history view.

If the deployment is reported as a failure as a result of a time out, and then if a subsequent deployment fails, this is expected behavior since the resource does in fact already exist in the agent's inventory.

I have made a few changes to make things a bit more clear. The timeout help next now notes that in the event of a timeout, the deployment may still succeed and that you may want to execute a discovery scan. On the agent side, we were incorrectly logging a timeout as failure which means the status in the child history view was getting reported as "Failed". I have updated the logic on the agent so that the status is correctly being reported as "Timed Out". And when you go to view the error, the message field is now prefixed with,

"The time out has been exceeded; however, the deployment may have been successful. You may want to run a discovery scan to see if the deployment did complete successfully. Also consider using a higher time out value for future deployments."

Changes have been pushed to the release/jon3.1.x branch.

commit hash: e1c81ff630

Comment 12 John Sanda 2012-08-16 15:33:25 UTC
The commit cited in comment 11 was a bad merge. I had to revert the commit and redo the merge. The new commit has been pushed to the release/jon3.1.x branch.

commit hash:  eea3f0285ab5c

Comment 13 John Sanda 2012-08-22 05:39:37 UTC
Moving to ON_QA. The JON 3.1.1 ER3 build is available at https://brewweb.devel.redhat.com/buildinfo?buildID=230321.

Comment 14 Armine Hovsepyan 2012-08-22 14:46:54 UTC
hi John,

Thanks a lot for your work. Could you please keep only this message?

"The time out has been exceeded; however, the deployment may have been successful. You may want to run a discovery scan to see if the deployment did complete successfully. Also consider using a higher time out value for future deployments.

Root Cause:
org.rhq.core.pc.inventory.TimeoutException: Call to [org.rhq.plugins.jbossas.JBossASServerComponent.createResource()] with args [[CreateResourceReport: ResourceType=[{JBossAS}Web Application (WAR)], ResourceKey=[null]]] timed out after 1000 milliseconds - invocation thread will be interrupted."


Thanks in advance.

Comment 15 John Sanda 2012-08-22 20:09:28 UTC
After some discussion it has been decied to *not* display the entire stack trace in the resource creation error message. Instead we will display the new, detailed error message along with the more brief exception message that reports the plugin component method in which the time out occurred. The full stack trace can still however be obtained from the agent logs with DEBUG logging enabled. Since this is a  known condition that we can and do handle, there is no need to overwhelm the user with an entire stack trace when we can provide a concise, yet detailed error message.

release/jon3.1.x commit hash: eca29122a

Comment 16 John Sanda 2012-08-30 01:52:06 UTC
The CR1 build is available at
https://brewweb.devel.redhat.com/buildinfo?buildID=231258. Moving to ON_QA.

Comment 17 Armine Hovsepyan 2012-08-30 09:28:19 UTC
thanks a lot John.

verified!

Comment 18 Heiko W. Rupp 2013-09-03 15:09:40 UTC
Bulk closing of old issues in VERIFIED state.


Note You need to log in before you can comment on or make changes to this bug.