Bug 849964

Summary:	[eap6] Starting managed server fails afer 10seconds - Read timeout
Product:	[Other] RHQ Project	Reporter:	Libor Zoubek <lzoubek>
Component:	Plugins	Assignee:	Stefan Negrea <snegrea>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Mike Foley <mfoley>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.4	CC:	hrupp, theute
Target Milestone:	---
Target Release:	RHQ 4.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	851655 (view as bug list)		Environment:
Last Closed:	2013-08-31 10:10:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	707223, 851655

Description Libor Zoubek 2012-08-21 11:10:55 UTC

Description of problem:


Version-Release number of selected component (if applicable):
JON 3.1.1.ER2 + EAP 6.0

How reproducible: not always


Steps to Reproduce:
1. have EAP6 running in domain mode in inventory
2. create new managed server (with autostart=false)
3. start it up (set blocking=true) right after it appears in your inventory (availability should be DOWN or UNKNOWN)
  
Actual results: Operation fails after 10seconds with following message:

java.lang.Exception: Read timed out, rolled-back=false, rolled-back=false
	at org.rhq.core.pc.operation.OperationInvocation.run(OperationInvocation.java:278)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)


but in fact, server has been really started, server's availability will turn to UP soon.

Expected results: Start operation succeeds


Additional info: I've also tried to wait 10minutes since managed server resource was created and first attempt to start it (blocking=true) and it succeeded. 

Could it be some interference between start operation and discovery scan running?

Comment 4 Heiko W. Rupp 2012-08-23 17:18:13 UTC

Is there a longer stacktrace around somewhere?
And are you sure about 10secs (and not 20secs) ?

Comment 5 Heiko W. Rupp 2012-08-23 17:54:18 UTC

Operation code is in org.rhq.modules.plugins.jbossas7.ManagedASComponent#invokeOperation

which does getASConnection().execute(op);
which is     
    public Result execute(Operation op) {
        return execute(op, false, 10);
    }

So here the 10s timeout is defined.

Option a) increase 10s to 30s by calling
  getASConnection().execute(op,<timeout in sec>);

we have done that in one other place as well.


b) add a config property to let the user specify a timeout value 
then continue with a)

Comment 6 Heiko W. Rupp 2012-08-23 19:31:07 UTC

[15:27:41] <mfoley> yeah ... increasing the timeout ... that seems low-risk fix for this point in the JON 3.1.1 lifecycle ... i am good with that

Comment 7 Stefan Negrea 2012-08-24 15:34:24 UTC

Applied option a) from comment #5. The timeout was increased from 10 to 30 seconds to avoid situations in which managed server operations run slower than expected due to heavy load on the host machine.

Comment 8 Stefan Negrea 2012-08-24 15:52:27 UTC

master branch commit:

http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=b8a6a496d40915abb22fdd87753aec0c131ed95d

Comment 9 Libor Zoubek 2012-08-30 16:53:40 UTC

verified on JON 3.1.1.CR1

Comment 10 Heiko W. Rupp 2013-08-31 10:10:28 UTC

Bulk close of old bugs in VERIFIED state.