849964 – [eap6] Starting managed server fails afer 10seconds - Read timeout

Bug 849964 - [eap6] Starting managed server fails afer 10seconds - Read timeout

Summary: [eap6] Starting managed server fails afer 10seconds - Read timeout

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Plugins
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHQ 4.5.0
Assignee:	Stefan Negrea
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	as7-plugin 851655
TreeView+	depends on / blocked

Reported:	2012-08-21 11:10 UTC by Libor Zoubek
Modified:	2015-11-02 00:43 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Clones:	851655 (view as bug list)
Environment:
Last Closed:	2013-08-31 10:10:28 UTC
Embargoed:

Attachments	(Terms of Use)

Description Libor Zoubek 2012-08-21 11:10:55 UTC

Description of problem:


Version-Release number of selected component (if applicable):
JON 3.1.1.ER2 + EAP 6.0

How reproducible: not always


Steps to Reproduce:
1. have EAP6 running in domain mode in inventory
2. create new managed server (with autostart=false)
3. start it up (set blocking=true) right after it appears in your inventory (availability should be DOWN or UNKNOWN)
  
Actual results: Operation fails after 10seconds with following message:

java.lang.Exception: Read timed out, rolled-back=false, rolled-back=false
	at org.rhq.core.pc.operation.OperationInvocation.run(OperationInvocation.java:278)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)


but in fact, server has been really started, server's availability will turn to UP soon.

Expected results: Start operation succeeds


Additional info: I've also tried to wait 10minutes since managed server resource was created and first attempt to start it (blocking=true) and it succeeded. 

Could it be some interference between start operation and discovery scan running?

Comment 4 Heiko W. Rupp 2012-08-23 17:18:13 UTC

Is there a longer stacktrace around somewhere?
And are you sure about 10secs (and not 20secs) ?

Comment 5 Heiko W. Rupp 2012-08-23 17:54:18 UTC

Operation code is in org.rhq.modules.plugins.jbossas7.ManagedASComponent#invokeOperation

which does getASConnection().execute(op);
which is     
    public Result execute(Operation op) {
        return execute(op, false, 10);
    }

So here the 10s timeout is defined.

Option a) increase 10s to 30s by calling
  getASConnection().execute(op,<timeout in sec>);

we have done that in one other place as well.


b) add a config property to let the user specify a timeout value 
then continue with a)

Comment 6 Heiko W. Rupp 2012-08-23 19:31:07 UTC

[15:27:41] <mfoley> yeah ... increasing the timeout ... that seems low-risk fix for this point in the JON 3.1.1 lifecycle ... i am good with that

Comment 7 Stefan Negrea 2012-08-24 15:34:24 UTC

Applied option a) from comment #5. The timeout was increased from 10 to 30 seconds to avoid situations in which managed server operations run slower than expected due to heavy load on the host machine.

Comment 8 Stefan Negrea 2012-08-24 15:52:27 UTC

master branch commit:

http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=b8a6a496d40915abb22fdd87753aec0c131ed95d

Comment 9 Libor Zoubek 2012-08-30 16:53:40 UTC

verified on JON 3.1.1.CR1

Comment 10 Heiko W. Rupp 2013-08-31 10:10:28 UTC

Bulk close of old bugs in VERIFIED state.

Note You need to log in before you can comment on or make changes to this bug.