Bug 1058267 - RHQ installation on JDK1.6 fails with 'Cannot obtain client connection to the RHQ app server!!'
Summary: RHQ installation on JDK1.6 fails with 'Cannot obtain client connection to the...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Installer
Version: 4.10
Hardware: All
OS: All
high
urgent
Target Milestone: ---
: RHQ 4.10
Assignee: Jirka Kremser
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1063364
TreeView+ depends on / blocked
 
Reported: 2014-01-27 11:53 UTC by Filip Brychta
Modified: 2014-09-26 09:42 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
: 1063364 (view as bug list)
Environment:
Last Closed: 2014-04-23 12:30:01 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1060066 0 unspecified CLOSED Agent fails to start on JDK1.6 with Unrecognized VM option 'StringTableSize=1000003' 2021-02-22 00:41:40 UTC

Internal Links: 1060066

Description Filip Brychta 2014-01-27 11:53:18 UTC
Description of problem:
$Summary

Version-Release number of selected component (if applicable):
rhq-server-4.10.0-SNAPSHOT
4f60fa0d3c7a

How reproducible:
Always

Steps to Reproduce:
1. unzip rhq-server-4.10.0-SNAPSHOT.zip
2. cd rhq-server-4.10.0-SNAPSHOT/bin/
3. ./rhqctl install
4. set jboss.bind.address to 0.0.0.0

Actual results:
RHQ storage is installed correctly but RHQ server installation failed with:
06:46:15,286 INFO  [org.rhq.server.control.command.Install] The RHQ Server must be started to complete its installation. Starting the RHQ server in preparation of running the server installer...
06:46:15,304 INFO  [org.rhq.server.control.command.Install] Waiting for the RHQ Server to start in preparation of running the server installer...
Trying to start the RHQ Server...
RHQ Server                     (pid 6847   ) is ✘ down
Failed to start - make sure the RHQ Server is fully configured properly
06:46:20,850 INFO  [org.jboss.modules] JBoss Modules version 1.2.0.CR1
06:46:20,971 INFO  [org.rhq.enterprise.server.installer.InstallerServiceImpl] The server is preconfigured and ready for auto-install.
06:46:21,055 INFO  [org.xnio] XNIO Version 3.0.7.GA
06:46:21,066 INFO  [org.xnio.nio] XNIO NIO Implementation Version 3.0.7.GA
06:46:21,074 INFO  [org.jboss.remoting] JBoss Remoting version 3.2.14.GA
06:46:31,338 ERROR [org.rhq.enterprise.server.installer.Installer] The installer will now exit due to previous errors: java.lang.Exception: Cannot obtain client connection to the RHQ app server!!
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.testModelControllerClient(InstallerServiceImpl.java:1100) [rhq-installer-util-4.10.0-SNAPSHOT.jar:4.10.0-SNAPSHOT]
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.preInstall(InstallerServiceImpl.java:217) [rhq-installer-util-4.10.0-SNAPSHOT.jar:4.10.0-SNAPSHOT]
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.test(InstallerServiceImpl.java:142) [rhq-installer-util-4.10.0-SNAPSHOT.jar:4.10.0-SNAPSHOT]
	at org.rhq.enterprise.server.installer.Installer.doInstall(Installer.java:90) [rhq-installer-util-4.10.0-SNAPSHOT.jar:4.10.0-SNAPSHOT]
	at org.rhq.enterprise.server.installer.Installer.main(Installer.java:57) [rhq-installer-util-4.10.0-SNAPSHOT.jar:4.10.0-SNAPSHOT]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_24]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.6.0_24]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.6.0_24]
	at java.lang.reflect.Method.invoke(Method.java:616) [rt.jar:1.6.0_24]
	at org.jboss.modules.Module.run(Module.java:262) [jboss-modules.jar:1.2.0.CR1]
	at org.jboss.modules.Main.main(Main.java:329) [jboss-modules.jar:1.2.0.CR1]
Caused by: java.io.IOException: java.net.ConnectException: JBAS012144: Could not connect to remote://127.0.0.1:9999. The connection timed out
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeForResult(AbstractModelControllerClient.java:129) [jboss-as-controller-client-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.execute(AbstractModelControllerClient.java:81) [jboss-as-controller-client-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.rhq.common.jbossas.client.controller.JBossASClient.execute(JBossASClient.java:270) [rhq-jboss-as-dmr-client-4.10.0-SNAPSHOT.jar:4.10.0-SNAPSHOT]
	at org.rhq.common.jbossas.client.controller.CoreJBossASClient.getSystemProperties(CoreJBossASClient.java:103) [rhq-jboss-as-dmr-client-4.10.0-SNAPSHOT.jar:4.10.0-SNAPSHOT]
	at org.rhq.enterprise.server.installer.InstallerServiceImpl.testModelControllerClient(InstallerServiceImpl.java:1051) [rhq-installer-util-4.10.0-SNAPSHOT.jar:4.10.0-SNAPSHOT]
	... 10 more
Caused by: java.net.ConnectException: JBAS012144: Could not connect to remote://127.0.0.1:9999. The connection timed out
	at org.jboss.as.protocol.ProtocolConnectionUtils.connectSync(ProtocolConnectionUtils.java:130) [jboss-as-protocol-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.protocol.ProtocolConnectionManager$EstablishingConnection.connect(ProtocolConnectionManager.java:256) [jboss-as-protocol-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.protocol.ProtocolConnectionManager.connect(ProtocolConnectionManager.java:70) [jboss-as-protocol-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.protocol.mgmt.FutureManagementChannel$Establishing.getChannel(FutureManagementChannel.java:176) [jboss-as-protocol-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.controller.client.impl.RemotingModelControllerClient.getOrCreateChannel(RemotingModelControllerClient.java:144) [jboss-as-controller-client-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.controller.client.impl.RemotingModelControllerClient$1.getChannel(RemotingModelControllerClient.java:65) [jboss-as-controller-client-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:115) [jboss-as-protocol-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.protocol.mgmt.ManagementChannelHandler.executeRequest(ManagementChannelHandler.java:98) [jboss-as-protocol-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeRequest(AbstractModelControllerClient.java:236) [jboss-as-controller-client-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.execute(AbstractModelControllerClient.java:141) [jboss-as-controller-client-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.controller.client.impl.AbstractModelControllerClient.executeForResult(AbstractModelControllerClient.java:127) [jboss-as-controller-client-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	... 14 more

06:46:31,352 ERROR [org.rhq.server.control.command.Install] An error occurred while starting the RHQ server: Process exited with an error: 2 (Exit value: 2)
RHQ Server                     (pid 6847   ) is ✘ down

Expected results:
Installation is succesfull

Comment 1 Filip Brychta 2014-01-31 09:51:40 UTC
Update:
This issue is related to JDK1.6. Installation works correctly on JDK1.7

Fails on:
[hudson@last-rhq-server bin]$ java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.1) (rhel-1.45.1.11.1.el6-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)


Works on:
[hudson@last-rhq-server bin]$ java -version
java version "1.7.0_03-icedtea"
OpenJDK Runtime Environment (rhel-2.1.el6.7-x86_64)
OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)

Comment 2 Filip Brychta 2014-01-31 12:46:18 UTC
This is the first build with this issue http://hudson.qa.jboss.com/hudson/view/RHQ/job/rhq-master-gwt-locales/967/

Comment 3 Jirka Kremser 2014-01-31 12:57:27 UTC
Can be replicated also on HotSpot JVM (1.6.0_24).

Comment 4 Jirka Kremser 2014-01-31 13:11:55 UTC
The build above (comment 2) contains 86 changes, by using git bisect, it is in the worse case  ceil(log_2(86)) = 7 "build-install-check" cycles. 

It would be nice to have some established tool based on foreman/jenkins to do this automatically.

I am investigating the commits, looking for the culprit..

Comment 5 Jirka Kremser 2014-02-03 13:55:30 UTC
In commit d6af564d06ce769d1f, the JVM param ("-XX:StringTableSize=1000003") was added that causes this issue. 

Java 6 has apparently problem with this parameter. In Java 7 and higher, the pooled strings are stored on heap, while on Java 6 and lower it is on permgen space.

Comment 6 Jirka Kremser 2014-02-10 15:04:43 UTC
When running a server with 5 attached agents (4 of them having EAP in full-ha profile and 1 having the RHQ itself), omitting this JVM param and running it on Java 6, the perm gen was enough to handle this situation. 

However, in bigger environments with multiple EAPs monitored by 1 agent this could be an issue. Luckily, most of the _different_ strings comes from the plugin descriptor => multiple resources of the same type does not increase the number of memory significantly. So the potential OOM could happen when having large number of different monitored resources (a lot of different plugins, a lot of resource types).

For EAPs the agent's permgen was constantly about 37 megs, the server's permgen was about 50 megs.

To make it run on Java 6 (which we currently support), I could add this workaround: 
   _JAVA_VERSION=`java -version 2>&1 | grep "java version" | sed -e 's/java version \"1\.\([0-9]\).*/\1/g'`
   if [ "$_JAVA_VERSION" -lt "7" ]; then
      echo "lower than 7"
   else
      echo "equal or higher than 7"
   fi
# ^ works for OpenJDK, IBM java and HotSpot, all it requires is that "java -version" returns (among other) a line containing 'java version "1.X'

1) to our rhq-server.sh and rhq-agent.sh

or

2) remove the JVM param completely (until we abandon Java6 support) with the risk that perm gen could not be enough and user should increase it (especially for agent having multiple plugins that monitor various kinds or resources)

Comment 8 Jirka Kremser 2014-02-10 16:02:15 UTC
For now, I'll push a commit doing the 2) approach from comment 6, because QE are not able to run their tests. If necessary, I've got the 1) solution/hack prepared as well.

This has the benefit that QE test suite uses still Java 6 and can discover any potential OOM errors (because of the insufficient PermGen) before this goes public.

Just for the record, if the param is not specified, on Java 7, the default behavior is that it defaults to a lower number (1009), so in the worst case scenario, multiple Strings ends up in the same bucket making the time complexity of String lookup closer to O(n) [instead of O(1)]. So the number is not any hard-limit for number of pooled strings or anything similar.


branch:  master
link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=8f0ce053c
time:    2014-02-10 16:46:14 +0100
commit:  8f0ce053c09a02886f670eff6d0295c0c62ced19
author:  Jirka Kremser - jkremser
message: [BZ 1058267] - RHQ installation on JDK1.6 fails with 'Cannot obtain
         client connection to the RHQ app server!! - removing the JVM
         parameter for increasing the size of a hashtable, where Strings
         are pooled (after calling .intern() on them), because this is
         not supported on Java 6. This commit can be reverted later on
         when not supporting Java 6 or when using another solution
         (check for Java version in the bash scripts)



Heiko, is this ^ "solution" ok with you or would you prefer the 2)

Comment 9 Jirka Kremser 2014-02-11 10:48:32 UTC
master 9000242dd

Comment 10 Heiko W. Rupp 2014-04-23 12:30:01 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.

Comment 11 Heiko W. Rupp 2014-09-26 09:42:28 UTC
For future RHQ versions we should only support jdk7+ where this flag is needed.
Jdk8 actually has a better default setting and since a certain version even automatic string-deduplication. 
I guess the current default is ok for now.


Note You need to log in before you can comment on or make changes to this bug.