Bug 812452

Summary: [eap6] timeout issues - deployment
Product: [Other] RHQ Project Reporter: Libor Zoubek <lzoubek>
Component: AgentAssignee: Heiko W. Rupp <hrupp>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: high    
Version: 4.4CC: hrupp, theute
Target Milestone: ---   
Target Release: RHQ 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-01 10:16:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 707223, 735475    

Description Libor Zoubek 2012-04-13 18:51:07 UTC
Description of problem: In general, almost all operations invoked on EAP by agent can timeout. Today I had issues with creating deployment. My machine had swap filled by 700M  and EAP became too slow then.

This BZ might be related to other operations that are adding/removing a child on EAP.

Version-Release number of selected component (if applicable):


How reproducible:hard


Steps to Reproduce:
1.Have an overloaded machine - low memory
2.standup EAP6 on it, inventory it
3.create WAR Deployment on it using JON UI
  
Actual results:

If EAP is too slow, child resource creation fails with timeout. Although, deployment appears on EAP. Further attempts to create a resource will be refused by EAP (duplicate resource). But deployment is not discovered (operation failed and discovery was not triggered) so user expects it really failed.

Expected results: if deploying times out and fails, deployment should not be on EAP. Otherwise deployment must succeed.


Additional info: This can be much bigger issue not only for overloaded machine, but also larger deployments. I was deploying WAR that contains 1 JSP. Customers will more likely deploy larger applications and could potentially run into same issue.

Comment 1 Charles Crouch 2012-04-16 15:25:28 UTC
Potentially related to https://bugzilla.redhat.com/show_bug.cgi?id=802796

Comment 2 Heiko W. Rupp 2012-04-25 04:59:46 UTC
There are two issues to this:

- When you create a deployment, and do not specify a timeout explicitly then a value of 60 seconds is assumed, after which the plugin container reports a failure no matter what happens inside the plugin

- the plugin currently had a fixed timeout of 10s for connecting and 60s for upload.
-- the plugin does not have access to the timeout from the wizard to e.g. set its value to (x-1)s

One could pass the timeout from the UI to the plugin and give the plugin a hint about the minimal timeout requested from the user. Just setting it to e.g. 10min inside the plugin will not work, if the user does not specify a timeout in the UI.

Comment 3 Heiko W. Rupp 2012-04-25 09:01:12 UTC
master 38935c2

Default timeout for uploads is now increased to 120 seconds. The user can override this via timeout setting in the UI; this timeout setting applies to the total time the plugin is working in the createChild method. This means that with a setting of 120sec in the UI, it is possible that the actual time that an upload may take is only 119sec.

Comment 4 Heiko W. Rupp 2013-09-01 10:16:56 UTC
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.