Bug 1027691
Summary: | Start servers in the group ends with read timed out | ||
---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | Radim Hatlapatka <rhatlapa> |
Component: | Plugin -- JBoss EAP 6 | Assignee: | Thomas Segismont <tsegismo> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | JON 3.2 | CC: | hrupp, rhatlapa, theute |
Target Milestone: | ER07 | ||
Target Release: | JON 3.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Radim Hatlapatka
2013-11-07 09:57:24 UTC
Radim, can you please give us a bit more context: how many servers in the group are you starting? If you start them through cli or whatever, how long does it take until they are up? Do you see more information in the agent log? (Could you perhaps paste a much longer snippet of agent log when this happens?) Thanks Heiko There are four servers managed by the domain controller, where three of them are in the group from which servers are being started. According to EAP 6.1.1 host-controller log the servers were started in less than 30 seconds as you can see in [1]. There are no messages visible in both JON agent logs and in JON server logs related to the operation of starting servers in server group. I am using the default logging level which logs all INFO and more severe messages. BTW: the operation took via our testsuite 0s see [2] [1] 14:31:32,214 INFO [org.jboss.as.host.controller] (HttpManagementService-threads - 7) JBAS010922: Starting server server-one 14:31:35,371 INFO [org.jboss.as.domain.controller.mgmt] (Remoting "dev84.mw.lab.eng.bos.redhat.com:MANAGEMENT" task-2) JBAS010920: Server [Server:server-one] registered using connection [Channel ID 042b3dff (inbound) of Remoting connection 6a120629 to /10.16.95.131:49226] 14:31:35,428 INFO [org.jboss.as.host.controller] (HttpManagementService-threads - 7) JBAS010922: Starting server server-two 14:31:35,498 INFO [org.jboss.as.host.controller] (server-registration-threads - 1) JBAS010919: Registering server server-one 14:31:42,590 INFO [org.jboss.as.domain.controller.mgmt] (Remoting "dev84.mw.lab.eng.bos.redhat.com:MANAGEMENT" task-2) JBAS010920: Server [Server:server-two] registered using connection [Channel ID 48df4e5c (inbound) of Remoting connection 3bf7c906 to /10.16.95.131:38136] 14:31:42,706 INFO [org.jboss.as.host.controller] (HttpManagementService-threads - 7) JBAS010922: Starting server server-four 14:31:42,834 INFO [org.jboss.as.host.controller] (server-registration-threads - 1) JBAS010919: Registering server server-two 14:31:50,809 INFO [org.jboss.as.domain.controller.mgmt] (Remoting "dev84.mw.lab.eng.bos.redhat.com:MANAGEMENT" task-3) JBAS010920: Server [Server:server-four] registered using connection [Channel ID 3d831085 (inbound) of Remoting connection 2f2ed387 to /10.16.95.131:58631] 14:31:50,986 INFO [org.jboss.as.host.controller] (server-registration-threads - 1) JBAS010919: Registering server server-four [2] [ant:testng] 201311111431:59.724 - INFO: Operation [Start Servers] finished after 0s (com.redhat.qe.jon.sahi.base.inventory.Operations.assertOperationResult) [ant:testng] 201311111432:01.461 - INFO: Retrieving the error message as it was expected that operation ends successfully but it didn't (com.redhat.qe.jon.sahi.base.inventory.Operations.assertOperationResult) [ant:testng] 201311111432:08.234 - SEVERE: Test Failed: startServers (com.redhat.qe.auto.testng.TestNGListener.onTestFailure) [ant:testng] java.lang.AssertionError: Operation [Start Servers] result: Success errorMessage: [ant:testng] java.lang.Exception: Read timed out, rolled-back=false, rolled-back=false at org.rhq.core.pc.operation.OperationInvocation.run(OperationInvocation.java:278) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) expected [true] but found [false] I have tried it one more time (now in the server group were only two servers) and it failed. From time submitted to time completed it took 11s and ended with read-timeout issue. It seems there is hardcoded timeout for 10s which should be increased the same way as it was done for restarting servers in server group, see https://bugzilla.redhat.com/show_bug.cgi?id=911327 Fixed in master commit cce4661454763dbd1cab0f2abca6f3309577959f Author: Thomas Segismont <tsegismo> Date: Wed Nov 13 11:51:35 2013 +0100 Applied the fix for BZ911327 to all operations (start/stop/restart) Cherry-picked to release/jon3.2.x commit ff2242931db30ea3e724572950f0b099ca56e6a6 Author: Thomas Segismont <tsegismo> Date: Wed Nov 13 11:51:35 2013 +0100 Radim, excellent analysis in #c3 Thank you. (In reply to Heiko W. Rupp from comment #6) > Radim, excellent analysis in #c3 Thank you. +1 Moving to ON_QA as available for testing with new brew build. Mass moving all of these from ER6 to target milestone ER07 since the ER6 build was bad and QE was halted for the same reason. I am not able to reproduce the issue any more with JON 3.2.0.ER7 |