Bug 849964

Summary: [eap6] Starting managed server fails afer 10seconds - Read timeout
Product: [Other] RHQ Project Reporter: Libor Zoubek <lzoubek>
Component: PluginsAssignee: Stefan Negrea <snegrea>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: hrupp, theute
Target Milestone: ---   
Target Release: RHQ 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 851655 (view as bug list) Environment:
Last Closed: 2013-08-31 06:10:28 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 707223, 851655    

Description Libor Zoubek 2012-08-21 07:10:55 EDT
Description of problem:

Version-Release number of selected component (if applicable):
JON 3.1.1.ER2 + EAP 6.0

How reproducible: not always

Steps to Reproduce:
1. have EAP6 running in domain mode in inventory
2. create new managed server (with autostart=false)
3. start it up (set blocking=true) right after it appears in your inventory (availability should be DOWN or UNKNOWN)
Actual results: Operation fails after 10seconds with following message:

java.lang.Exception: Read timed out, rolled-back=false, rolled-back=false
	at org.rhq.core.pc.operation.OperationInvocation.run(OperationInvocation.java:278)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)

but in fact, server has been really started, server's availability will turn to UP soon.

Expected results: Start operation succeeds

Additional info: I've also tried to wait 10minutes since managed server resource was created and first attempt to start it (blocking=true) and it succeeded. 

Could it be some interference between start operation and discovery scan running?
Comment 4 Heiko W. Rupp 2012-08-23 13:18:13 EDT
Is there a longer stacktrace around somewhere?
And are you sure about 10secs (and not 20secs) ?
Comment 5 Heiko W. Rupp 2012-08-23 13:54:18 EDT
Operation code is in org.rhq.modules.plugins.jbossas7.ManagedASComponent#invokeOperation

which does getASConnection().execute(op);
which is     
    public Result execute(Operation op) {
        return execute(op, false, 10);

So here the 10s timeout is defined.

Option a) increase 10s to 30s by calling
  getASConnection().execute(op,<timeout in sec>);

we have done that in one other place as well.

b) add a config property to let the user specify a timeout value 
then continue with a)
Comment 6 Heiko W. Rupp 2012-08-23 15:31:07 EDT
[15:27:41] <mfoley> yeah ... increasing the timeout ... that seems low-risk fix for this point in the JON 3.1.1 lifecycle ... i am good with that
Comment 7 Stefan Negrea 2012-08-24 11:34:24 EDT
Applied option a) from comment #5. The timeout was increased from 10 to 30 seconds to avoid situations in which managed server operations run slower than expected due to heavy load on the host machine.
Comment 9 Libor Zoubek 2012-08-30 12:53:40 EDT
verified on JON 3.1.1.CR1
Comment 10 Heiko W. Rupp 2013-08-31 06:10:28 EDT
Bulk close of old bugs in VERIFIED state.