Description of problem: rhqctl times out connection to EAP server if it takes too long to startup and fails the installation. Version-Release number of selected component (if applicable): 3.2 How reproducible: Steps to Reproduce: 1. put a load on the CPU and I/O with a simple script: for worker in 1 2 3; do dd if=/dev/urandom of=/dev/null & done 2. run rhqctl install while the CPU load is high so that when it tries to start the server it takes longer than 30 seconds. Actual results: rhq-installer.log shows: 15:39:36,442 ERROR [org.rhq.enterprise.server.installer.Installer] The installer will now exit due to previous errors: java.lang.Exception: Cannot obtain client connection to the RHQ app server!! at org.rhq.enterprise.server.installer.InstallerServiceImpl.testModelControllerClient(InstallerServiceImpl.java:1101) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA] at org.rhq.enterprise.server.installer.InstallerServiceImpl.preInstall(InstallerServiceImpl.java:217) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA] at org.rhq.enterprise.server.installer.InstallerServiceImpl.test(InstallerServiceImpl.java:142) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA] at org.rhq.enterprise.server.installer.Installer.doInstall(Installer.java:90) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA] at org.rhq.enterprise.server.installer.Installer.main(Installer.java:57) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_71] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_71] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_71] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_71] at org.jboss.modules.Module.run(Module.java:270) [jboss-modules.jar:1.2.2.Final-redhat-1] at org.jboss.modules.Main.main(Main.java:411) [jboss-modules.jar:1.2.2.Final-redhat-1] Caused by: java.io.IOException: java.net.ConnectException: JBAS012144: Could not connect to remote://127.0.0.1:9999. The connection timed out at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeForResult(AbstractModelControllerClient.java:129) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.controller.client.impl.AbstractModelControllerClient.execute(AbstractModelControllerClient.java:81) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.rhq.common.jbossas.client.controller.JBossASClient.execute(JBossASClient.java:270) [rhq-jboss-as-dmr-client-4.9.0.JON320GA.jar:4.9.0.JON320GA] at org.rhq.common.jbossas.client.controller.CoreJBossASClient.getSystemProperties(CoreJBossASClient.java:103) [rhq-jboss-as-dmr-client-4.9.0.JON320GA.jar:4.9.0.JON320GA] at org.rhq.enterprise.server.installer.InstallerServiceImpl.testModelControllerClient(InstallerServiceImpl.java:1052) [rhq-installer-util-4.9.0.JON320GA.jar:4.9.0.JON320GA] ... 10 more Caused by: java.net.ConnectException: JBAS012144: Could not connect to remote://127.0.0.1:9999. The connection timed out at org.jboss.as.protocol.ProtocolConnectionUtils.connectSync(ProtocolConnectionUtils.java:131) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.protocol.ProtocolConnectionManager$EstablishingConnection.connect(ProtocolConnectionManager.java:256) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.protocol.ProtocolConnectionManager.connect(ProtocolConnectionManager.java:70) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.protocol.mgmt.FutureManagementChannel$Establishing.getChannel(FutureManagementChannel.java:176) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.controller.client.impl.RemotingModelControllerClient.getOrCreateChannel(RemotingModelControllerClient.java:144) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.controller.client.impl.RemotingModelControllerClient$1.getChannel(RemotingModelControllerClient.java:65) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:115) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:90) [jboss-as-protocol-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeRequest(AbstractModelControllerClient.java:236) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.controller.client.impl.AbstractModelControllerClient.execute(AbstractModelControllerClient.java:141) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeForResult(AbstractModelControllerClient.java:127) [jboss-as-controller-client-7.2.1.Final-redhat-10.jar:7.2.1.Final-redhat-10] ... 14 more The server.log shows the EAP server starting but slowly, taking more than 20 seconds for some of the services to start. Expected results: EAP server is running and the installation completes. Additional info:
Looks like the workflow here in preInstall is to test the controller client connection without any timeout: Inside org.rhq.enterprise.server.installer.InstallerServiceImpl.preInstall(): // make an attempt to connect to the app server - we must make sure its running and we can connect to it final String asVersion = testModelControllerClient(serverProperties); We should change that testModelControllerClient call to use the one that takes a timeout: testModelControllerClient(HashMap<String, String>, int) We can use a backdoor sysproperty (whose default is 60) so we can provide a way to customize the timeout if need be. But honestly, if 60 seconds isn't enough, your machine is getting beat down so you should fix that before installing anything else extra :) There is also a call to this test method elsewhere with a hardcoded 60. If we want to make a backdoor sysprop, we should use it here as well: // we need to wait for the reload to finish - wait until we can connect again testModelControllerClient(60); Anyway, I think this is a lower priority issue.
commit 8e91dec6d6056044315368ada6325c1eda1d24e8 Author: John Mazzitelli <mazz> Date: Thu Nov 6 13:03:04 2014 -0500 BZ 1160851 - add the ability to wait for N seconds while testing for the server to come up. Default is 60s
I added the ability to wait up to 60s by default. Before, the initial test didn't wait at all.
branch: release/jon3.3.x link: https://github.com/rhq-project/rhq/commit/665bccdbb time: 2015-01-12 10:34:26 +0100 commit: 665bccdbb19d1c9326aa598cf715e0013c0ccfd4 author: John Mazzitelli - mazz message: BZ 1160851 - add the ability to wait for N seconds while testing for the server to come up. Default is 60s, but there is now a backdoor sysprop you can set if you want it to be longer or shorter. (cherry picked from commit 8e91dec6d6056044315368ada6325c1eda1d24e8) Signed-off-by: Libor Zoubek <lzoubek>
Moving to ON_QA as available for test with the latest 3.3.1.ER01 bits from here: http://download.devel.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/12/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip
Verified on Version : 3.3.0.GA Update 03 Build Number : e4b348a:2f80c8c