Bug 849964

Summary: [eap6] Starting managed server fails afer 10seconds - Read timeout
Product: [Other] RHQ Project Reporter: Libor Zoubek <lzoubek>
Component: PluginsAssignee: Stefan Negrea <snegrea>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: hrupp, theute
Target Milestone: ---   
Target Release: RHQ 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 851655 (view as bug list) Environment:
Last Closed: 2013-08-31 10:10:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 707223, 851655    

Description Libor Zoubek 2012-08-21 11:10:55 UTC
Description of problem:


Version-Release number of selected component (if applicable):
JON 3.1.1.ER2 + EAP 6.0

How reproducible: not always


Steps to Reproduce:
1. have EAP6 running in domain mode in inventory
2. create new managed server (with autostart=false)
3. start it up (set blocking=true) right after it appears in your inventory (availability should be DOWN or UNKNOWN)
  
Actual results: Operation fails after 10seconds with following message:

java.lang.Exception: Read timed out, rolled-back=false, rolled-back=false
	at org.rhq.core.pc.operation.OperationInvocation.run(OperationInvocation.java:278)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)


but in fact, server has been really started, server's availability will turn to UP soon.

Expected results: Start operation succeeds


Additional info: I've also tried to wait 10minutes since managed server resource was created and first attempt to start it (blocking=true) and it succeeded. 

Could it be some interference between start operation and discovery scan running?

Comment 4 Heiko W. Rupp 2012-08-23 17:18:13 UTC
Is there a longer stacktrace around somewhere?
And are you sure about 10secs (and not 20secs) ?

Comment 5 Heiko W. Rupp 2012-08-23 17:54:18 UTC
Operation code is in org.rhq.modules.plugins.jbossas7.ManagedASComponent#invokeOperation

which does getASConnection().execute(op);
which is     
    public Result execute(Operation op) {
        return execute(op, false, 10);
    }

So here the 10s timeout is defined.

Option a) increase 10s to 30s by calling
  getASConnection().execute(op,<timeout in sec>);

we have done that in one other place as well.


b) add a config property to let the user specify a timeout value 
then continue with a)

Comment 6 Heiko W. Rupp 2012-08-23 19:31:07 UTC
[15:27:41] <mfoley> yeah ... increasing the timeout ... that seems low-risk fix for this point in the JON 3.1.1 lifecycle ... i am good with that

Comment 7 Stefan Negrea 2012-08-24 15:34:24 UTC
Applied option a) from comment #5. The timeout was increased from 10 to 30 seconds to avoid situations in which managed server operations run slower than expected due to heavy load on the host machine.

Comment 9 Libor Zoubek 2012-08-30 16:53:40 UTC
verified on JON 3.1.1.CR1

Comment 10 Heiko W. Rupp 2013-08-31 10:10:28 UTC
Bulk close of old bugs in VERIFIED state.