Bug 1027691 - Start servers in the group ends with read timed out
Summary: Start servers in the group ends with read timed out
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Plugin -- JBoss EAP 6
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ER07
: JON 3.2.0
Assignee: Thomas Segismont
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-07 09:57 UTC by Radim Hatlapatka
Modified: 2014-01-02 20:39 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Radim Hatlapatka 2013-11-07 09:57:24 UTC
Description of problem:
Very often it happens that when starting servers in server group ends with:

java.lang.Exception: Read timed out, rolled-back=false, rolled-back=false at org.rhq.core.pc.operation.OperationInvocation.run(OperationInvocation.java:278) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 

Version-Release number of selected component (if applicable): EAP 6.1.1 vs JON 3.2.0.ER4


How reproducible: 30%


Steps to Reproduce in JON:
1. import eap 6 in domain mode
2. stop servers in servers main group
3. start servers in servers main group

Actual results:
start operation fails with the exception shown in JON UI


Expected results:
servers in domain mode succeeds


Additional info:
In my case the servers were first restarted, then stopped and then started.

I haven't found any relevant information neither in agents log and neither in the JON server log

Comment 1 Heiko W. Rupp 2013-11-08 14:07:36 UTC
Radim,
can you please give us a bit more context: how many servers in the group are you starting? If you start them through cli or whatever, how long does it take until they are up?
Do you see more information in the agent log? (Could you perhaps paste a much longer snippet of agent log when this happens?)

 Thanks
   Heiko

Comment 2 Radim Hatlapatka 2013-11-12 10:02:27 UTC
There are four servers managed by the domain controller, where three of them are in the group from which servers are being started.

According to EAP 6.1.1 host-controller log the servers were started in less than 30 seconds as you can see in [1].

There are no messages visible in both JON agent logs and in JON server logs related to the operation of starting servers in server group. I am using the default logging level which logs all INFO and more severe messages.

BTW: the operation took via our testsuite 0s see [2]


[1]
14:31:32,214 INFO  [org.jboss.as.host.controller] (HttpManagementService-threads - 7) JBAS010922: Starting server server-one
14:31:35,371 INFO  [org.jboss.as.domain.controller.mgmt] (Remoting "dev84.mw.lab.eng.bos.redhat.com:MANAGEMENT" task-2) JBAS010920: Server [Server:server-one] registered using connection [Channel ID 042b3dff (inbound) of Remoting connection 6a120629 to /10.16.95.131:49226]
14:31:35,428 INFO  [org.jboss.as.host.controller] (HttpManagementService-threads - 7) JBAS010922: Starting server server-two
14:31:35,498 INFO  [org.jboss.as.host.controller] (server-registration-threads - 1) JBAS010919: Registering server server-one
14:31:42,590 INFO  [org.jboss.as.domain.controller.mgmt] (Remoting "dev84.mw.lab.eng.bos.redhat.com:MANAGEMENT" task-2) JBAS010920: Server [Server:server-two] registered using connection [Channel ID 48df4e5c (inbound) of Remoting connection 3bf7c906 to /10.16.95.131:38136]
14:31:42,706 INFO  [org.jboss.as.host.controller] (HttpManagementService-threads - 7) JBAS010922: Starting server server-four
14:31:42,834 INFO  [org.jboss.as.host.controller] (server-registration-threads - 1) JBAS010919: Registering server server-two
14:31:50,809 INFO  [org.jboss.as.domain.controller.mgmt] (Remoting "dev84.mw.lab.eng.bos.redhat.com:MANAGEMENT" task-3) JBAS010920: Server [Server:server-four] registered using connection [Channel ID 3d831085 (inbound) of Remoting connection 2f2ed387 to /10.16.95.131:58631]
14:31:50,986 INFO  [org.jboss.as.host.controller] (server-registration-threads - 1) JBAS010919: Registering server server-four


[2]
[ant:testng] 201311111431:59.724 - INFO: Operation [Start Servers] finished after 0s (com.redhat.qe.jon.sahi.base.inventory.Operations.assertOperationResult)
[ant:testng] 201311111432:01.461 - INFO: Retrieving the error message as it was expected that operation ends successfully but it didn't (com.redhat.qe.jon.sahi.base.inventory.Operations.assertOperationResult)
[ant:testng] 201311111432:08.234 - SEVERE: Test Failed: startServers (com.redhat.qe.auto.testng.TestNGListener.onTestFailure)
[ant:testng] java.lang.AssertionError: Operation [Start Servers] result: Success errorMessage:
[ant:testng] java.lang.Exception: Read timed out, rolled-back=false, rolled-back=false at org.rhq.core.pc.operation.OperationInvocation.run(OperationInvocation.java:278) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) expected [true] but found [false]

Comment 3 Radim Hatlapatka 2013-11-12 11:23:36 UTC
I have tried it one more time (now in the server group were only two servers) and it failed.

From time submitted to time completed it took 11s and ended with read-timeout issue. It seems there is hardcoded timeout for 10s which should be increased the same way as it was done for restarting servers in server group, see https://bugzilla.redhat.com/show_bug.cgi?id=911327

Comment 4 Thomas Segismont 2013-11-13 10:52:57 UTC
Fixed in master

commit cce4661454763dbd1cab0f2abca6f3309577959f
Author: Thomas Segismont <tsegismo>
Date:   Wed Nov 13 11:51:35 2013 +0100

Applied the fix for BZ911327 to all operations (start/stop/restart)

Comment 5 Thomas Segismont 2013-11-13 10:55:01 UTC
Cherry-picked to release/jon3.2.x

commit ff2242931db30ea3e724572950f0b099ca56e6a6
Author: Thomas Segismont <tsegismo>
Date:   Wed Nov 13 11:51:35 2013 +0100

Comment 6 Heiko W. Rupp 2013-11-13 12:00:41 UTC
Radim, excellent analysis in #c3 Thank you.

Comment 7 Thomas Segismont 2013-11-13 13:03:20 UTC
(In reply to Heiko W. Rupp from comment #6)
> Radim, excellent analysis in #c3 Thank you.

+1

Comment 8 Simeon Pinder 2013-11-19 15:47:47 UTC
Moving to ON_QA as available for testing with new brew build.

Comment 9 Simeon Pinder 2013-11-22 05:13:21 UTC
Mass moving all of these from ER6 to target milestone ER07 since the ER6 build was bad and QE was halted for the same reason.

Comment 10 Radim Hatlapatka 2013-11-27 16:30:57 UTC
I am not able to reproduce the issue any more with JON 3.2.0.ER7


Note You need to log in before you can comment on or make changes to this bug.