Bug 1160851

Summary: rhqctl times out connection to EAP server if it takes too long to startup and fails the installation
Product: [JBoss] JBoss Operations Network Reporter: dsteigne
Component: InstallerAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED ERRATA QA Contact: Filip Brychta <fbrychta>
Severity: high Docs Contact:
Priority: high    
Version: JON 3.2, JON 3.2.1, JON 3.2.2, JON 3.2.3CC: fbrychta, loleary, lzoubek, mazz, mmahoney, myarboro
Target Milestone: ER01Keywords: Triaged
Target Release: JON 3.3.3   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-07 07:42:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dsteigne 2014-11-05 19:33:38 UTC
Description of problem:
rhqctl times out connection to EAP server if it takes too long to startup and fails the installation.  

Version-Release number of selected component (if applicable):
3.2

How reproducible:


Steps to Reproduce:
1. put a load on the CPU and I/O with a simple script:

    for worker in 1 2 3; do
        dd if=/dev/urandom of=/dev/null &
    done
2. run rhqctl install while the CPU load is high so that when it tries to start the server it takes longer than 30 seconds.


Actual results:
rhq-installer.log shows:

15:39:36,442 ERROR [org.rhq.enterprise.server.installer.Installer] The installer will now exit due to previous errors: java.lang.Exception: Cannot obtain client connection to the RHQ app server!!
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.testModelControllerClient(InstallerServiceImpl.java:1101) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.preInstall(InstallerServiceImpl.java:217) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.test(InstallerServiceImpl.java:142) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.Installer.doInstall(Installer.java:90) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.Installer.main(Installer.java:57) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_71]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_71]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_71]
	at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_71]
	at org.jboss.modules.Module.run(Module.java:270) [jboss-modules.jar:1.2.2.Final-redhat-1]
	at org.jboss.modules.Main.main(Main.java:411) [jboss-modules.jar:1.2.2.Final-redhat-1]
Caused by: java.io.IOException: java.net.ConnectException: JBAS012144: Could not connect to remote://127.0.0.1:9999. The connection timed out
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeForResult(AbstractModelControllerClient.java:129) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.execute(AbstractModelControllerClient.java:81) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.rhq.common.jbossas.client.controller.JBossASClient.execute(JBossASClient.java:270) [rhq-jboss-as-dmr-client-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.common.jbossas.client.controller.CoreJBossASClient.getSystemProperties(CoreJBossASClient.java:103) [rhq-jboss-as-dmr-client-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.testModelControllerClient(InstallerServiceImpl.java:1052) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA]
	... 10 more
Caused by: java.net.ConnectException: JBAS012144: Could not connect to remote://127.0.0.1:9999. The connection timed out
	at org.jboss.as.protocol.ProtocolConnectionUtils.connectSync(ProtocolConnectionUtils.java:131) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.ProtocolConnectionManager$EstablishingConnection.connect(ProtocolConnectionManager.java:256) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.ProtocolConnectionManager.connect(ProtocolConnectionManager.java:70) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.mgmt.FutureManagementChannel$Establishing.getChannel(FutureManagementChannel.java:176) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.RemotingModelControllerClient.getOrCreateChannel(RemotingModelControllerClient.java:144) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.RemotingModelControllerClient$1.getChannel(RemotingModelControllerClient.java:65) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:115) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:90) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeRequest(AbstractModelControllerClient.java:236) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.execute(AbstractModelControllerClient.java:141) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeForResult(AbstractModelControllerClient.java:127) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10]
	... 14 more

The server.log shows the EAP server starting but slowly, taking more than 20 seconds for some of the services to start.

Expected results:
EAP server is running and the installation completes.

Additional info:

Comment 2 John Mazzitelli 2014-11-06 14:33:31 UTC
Looks like the workflow here in preInstall is to test the controller client connection without any timeout:

Inside org.rhq.enterprise.server.installer.InstallerServiceImpl.preInstall():

        // make an attempt to connect to the app server - we must make sure its running and we can connect to it
        final String asVersion = testModelControllerClient(serverProperties);

We should change that testModelControllerClient call to use the one that takes a timeout:

        testModelControllerClient(HashMap<String, String>, int)

We can use a backdoor sysproperty (whose default is 60) so we can provide a way to customize the timeout if need be. But honestly, if 60 seconds isn't enough, your machine is getting beat down so you should fix that before installing anything else extra :)

There is also a call to this test method elsewhere with a hardcoded 60. If we want to make a backdoor sysprop, we should use it here as well:

        // we need to wait for the reload to finish - wait until we can connect again
        testModelControllerClient(60);


Anyway, I think this is a lower priority issue.

Comment 3 John Mazzitelli 2014-11-06 18:04:02 UTC
commit 8e91dec6d6056044315368ada6325c1eda1d24e8
Author: John Mazzitelli <mazz>
Date:   Thu Nov 6 13:03:04 2014 -0500

    BZ 1160851 - add the ability to wait for N seconds while testing for the server to come up. Default is 60s

Comment 4 John Mazzitelli 2014-11-06 18:05:09 UTC
I added the ability to wait up to 60s by default. Before, the initial test didn't wait at all.

Comment 5 Libor Zoubek 2015-01-12 09:35:28 UTC
branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/665bccdbb
time:    2015-01-12 10:34:26 +0100
commit:  665bccdbb19d1c9326aa598cf715e0013c0ccfd4
author:  John Mazzitelli - mazz
message: BZ 1160851 - add the ability to wait for N seconds while testing for the
         server to come up. Default is 60s, but there is now a backdoor
         sysprop you can set if you want it to be longer or shorter.
         (cherry picked from commit
         8e91dec6d6056044315368ada6325c1eda1d24e8) Signed-off-by: Libor
         Zoubek <lzoubek>

Comment 6 Simeon Pinder 2015-01-26 08:15:08 UTC
Moving to ON_QA as available for test with the latest 3.3.1.ER01 bits from here:
http://download.devel.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/12/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip

Comment 8 Filip Brychta 2015-07-14 15:15:17 UTC
Verified on
Version :	
3.3.0.GA Update 03
Build Number :	
e4b348a:2f80c8c