Bug 1160851 - rhqctl times out connection to EAP server if it takes too long to startup and fails the installation
Summary: rhqctl times out connection to EAP server if it takes too long to startup and...
Status: CLOSED ERRATA
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Installer
Version: JON 3.2,JON 3.2.1,JON 3.2.2,JON 3.2.3
Hardware: All
OS: All
high
high
Target Milestone: ER01
: JON 3.3.3
Assignee: Jay Shaughnessy
QA Contact: Filip Brychta
URL:
Whiteboard:
Keywords: Triaged
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-05 19:33 UTC by dsteigne
Modified: 2018-12-09 19:06 UTC (History)
6 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2015-08-07 07:42:42 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1259983 None None None Never

Description dsteigne 2014-11-05 19:33:38 UTC
Description of problem:
rhqctl times out connection to EAP server if it takes too long to startup and fails the installation.  

Version-Release number of selected component (if applicable):
3.2

How reproducible:


Steps to Reproduce:
1. put a load on the CPU and I/O with a simple script:

    for worker in 1 2 3; do
        dd if=/dev/urandom of=/dev/null &
    done
2. run rhqctl install while the CPU load is high so that when it tries to start the server it takes longer than 30 seconds.


Actual results:
rhq-installer.log shows:

15:39:36,442 ERROR [org.rhq.enterprise.server.installer.Installer] The installer will now exit due to previous errors: java.lang.Exception: Cannot obtain client connection to the RHQ app server!!
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.testModelControllerClient(InstallerServiceImpl.java:1101) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.preInstall(InstallerServiceImpl.java:217) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.test(InstallerServiceImpl.java:142) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.Installer.doInstall(Installer.java:90) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.Installer.main(Installer.java:57) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_71]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_71]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_71]
	at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_71]
	at org.jboss.modules.Module.run(Module.java:270) [jboss-modules.jar:1.2.2.Final-redhat-1]
	at org.jboss.modules.Main.main(Main.java:411) [jboss-modules.jar:1.2.2.Final-redhat-1]
Caused by: java.io.IOException: java.net.ConnectException: JBAS012144: Could not connect to remote://127.0.0.1:9999. The connection timed out
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeForResult(AbstractModelControllerClient.java:129) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.execute(AbstractModelControllerClient.java:81) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.rhq.common.jbossas.client.controller.JBossASClient.execute(JBossASClient.java:270) [rhq-jboss-as-dmr-client-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.common.jbossas.client.controller.CoreJBossASClient.getSystemProperties(CoreJBossASClient.java:103) [rhq-jboss-as-dmr-client-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.testModelControllerClient(InstallerServiceImpl.java:1052) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	... 10 more
Caused by: java.net.ConnectException: JBAS012144: Could not connect to remote://127.0.0.1:9999. The connection timed out
	at org.jboss.as.protocol.ProtocolConnectionUtils.connectSync(ProtocolConnectionUtils.java:131) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.ProtocolConnectionManager$EstablishingConnection.connect(ProtocolConnectionManager.java:256) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.ProtocolConnectionManager.connect(ProtocolConnectionManager.java:70) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.mgmt.FutureManagementChannel$Establishing.getChannel(FutureManagementChannel.java:176) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.RemotingModelControllerClient.getOrCreateChannel(RemotingModelControllerClient.java:144) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.RemotingModelControllerClient$1.getChannel(RemotingModelControllerClient.java:65) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:115) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:90) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeRequest(AbstractModelControllerClient.java:236) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.execute(AbstractModelControllerClient.java:141) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeForResult(AbstractModelControllerClient.java:127) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	... 14 more

The server.log shows the EAP server starting but slowly, taking more than 20 seconds for some of the services to start.

Expected results:
EAP server is running and the installation completes.

Additional info:

Comment 2 John Mazzitelli 2014-11-06 14:33:31 UTC
Looks like the workflow here in preInstall is to test the controller client connection without any timeout:

Inside org.rhq.enterprise.server.installer.InstallerServiceImpl.preInstall():

        // make an attempt to connect to the app server - we must make sure its running and we can connect to it
        final String asVersion = testModelControllerClient(serverProperties);

We should change that testModelControllerClient call to use the one that takes a timeout:

        testModelControllerClient(HashMap<String, String>, int)

We can use a backdoor sysproperty (whose default is 60) so we can provide a way to customize the timeout if need be. But honestly, if 60 seconds isn't enough, your machine is getting beat down so you should fix that before installing anything else extra :)

There is also a call to this test method elsewhere with a hardcoded 60. If we want to make a backdoor sysprop, we should use it here as well:

        // we need to wait for the reload to finish - wait until we can connect again
        testModelControllerClient(60);


Anyway, I think this is a lower priority issue.

Comment 3 John Mazzitelli 2014-11-06 18:04:02 UTC
commit 8e91dec6d6056044315368ada6325c1eda1d24e8
Author: John Mazzitelli <mazz@redhat.com>
Date:   Thu Nov 6 13:03:04 2014 -0500

    BZ 1160851 - add the ability to wait for N seconds while testing for the server to come up. Default is 60s

Comment 4 John Mazzitelli 2014-11-06 18:05:09 UTC
I added the ability to wait up to 60s by default. Before, the initial test didn't wait at all.

Comment 5 Libor Zoubek 2015-01-12 09:35:28 UTC
branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/665bccdbb
time:    2015-01-12 10:34:26 +0100
commit:  665bccdbb19d1c9326aa598cf715e0013c0ccfd4
author:  John Mazzitelli - mazz@redhat.com
message: BZ 1160851 - add the ability to wait for N seconds while testing for the
         server to come up. Default is 60s, but there is now a backdoor
         sysprop you can set if you want it to be longer or shorter.
         (cherry picked from commit
         8e91dec6d6056044315368ada6325c1eda1d24e8) Signed-off-by: Libor
         Zoubek <lzoubek@redhat.com>

Comment 6 Simeon Pinder 2015-01-26 08:15:08 UTC
Moving to ON_QA as available for test with the latest 3.3.1.ER01 bits from here:
http://download.devel.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/12/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip

Comment 8 Filip Brychta 2015-07-14 15:15:17 UTC
Verified on
Version :	
3.3.0.GA Update 03
Build Number :	
e4b348a:2f80c8c


Note You need to log in before you can comment on or make changes to this bug.