950422 – exceptions in rhq-install.sh log

Bug 950422 - exceptions in rhq-install.sh log

Summary: exceptions in rhq-install.sh log

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Installer
Sub Component:
Version:	JON 3.2
Hardware:	i686
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	ER01
Target Release:	JON 3.2.0
Assignee:	Stefan Negrea
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	971377 951619
TreeView+	depends on / blocked

Reported:	2013-04-10 08:38 UTC by Armine Hovsepyan
Modified:	2015-09-03 00:01 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-04-15 01:47:26 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
rhq_sh_log.txt (5.08 KB, text/plain) 2013-04-10 08:38 UTC, Armine Hovsepyan	no flags	Details
rhq-storage-installer.png (211.57 KB, image/png) 2013-08-26 10:55 UTC, Armine Hovsepyan	no flags	Details
View All

Description Armine Hovsepyan 2013-04-10 08:38:58 UTC

Created attachment 733544 [details]
rhq_sh_log.txt

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. download latest rhq server installer (having cassandra integrated)
2. from non-root user run rhq.sh 
3.
  
Actual results:
rhq server with storage and agetn is installed, having exceptions in log

Expected results:
rhq server with storage and agetn is installed - without exceptions in log

Additional info:
log is attached

Comment 1 John Sanda 2013-04-15 01:47:26 UTC

I am closing this since out since rhq.sh has been replaced by rhqctl.sh. Please retest with rhqctl.sh. I may have investigated these exceptions at some point, and I want to say that it looked like it may have been an issue specific to the environment. With that said, if this error occurs again, please run the RHQ installer from a master build to see if you get the error there is well. Even if it is not an issue introduced in the cassandra-backend branch and even if it looks like an environment-specific issue, someone ought to determine what is causing the problems because at some point someone else is bound to run into it.

Comment 2 Armine Hovsepyan 2013-04-18 11:13:36 UTC

Sorry for reopening. The exception exists while running rhq-install.sh as non-root user.

Editing the description of bug.

Comment 4 John Sanda 2013-04-23 11:56:17 UTC

Armine, can you please retest this with a master build as well?

Comment 5 Armine Hovsepyan 2013-04-23 12:40:39 UTC

Bug is visible in master build as well.

removing dependency on Cassandra back-end tracker.

Comment 6 Filip Brychta 2013-06-03 08:32:17 UTC

This bz probably causes following exception in rhq server log:

01:58:26,230 ERROR [org.quartz.core.JobRunShell] (RHQScheduler_Worker-5) Job CassandraClusterHeartBeatJobGroup.CassandraClusterHeartBeatJob threw an unhandled Exception: : java.lang.IllegalArgumentException: Expected string of the form, hostname|jmxPort|nativeTransportPort
        at org.rhq.core.domain.cloud.StorageNode.parseNodeInformation(StorageNode.java:272) [rhq-core-domain-ejb3.jar:4.8.0-SNAPSHOT]
        at org.rhq.enterprise.server.cassandra.CassandraClusterHeartBeatJob.execute(CassandraClusterHeartBeatJob.java:72) [rhq-enterprise-server-ejb3.jar:4.8.0-SNAPSHOT]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-1.6.5.jar:1.6.5]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) [quartz-1.6.5.jar:1.6.5]

This exception is thrown each ~5 seconds.
AFAIK it should be caused by incorrect format of 'rhq.cassandra.seeds' property in rhq-server.properties but this property is set correctly. This exception disappears once the rhq server is restarted.

So i think its like this:
The RHQ server is not restarted after the installation because of this bz -> configuration is not reloaded -> exception from this comment is thrown

Comment 7 Filip Brychta 2013-06-03 14:01:04 UTC

This is a bit more serious. A few tests from our cli automation suite failed because of this throwing exception:

Caused by: javax.ejb.EJBException: [Warning] java.lang.IllegalStateException: JBAS011048: Failed to construct component instance
at org.rhq.enterprise.server.remote.RemoteSafeInvocationHandler.invoke(RemoteSafeInvocationHandler.java:116)
at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:967)
at org.jboss.remoting.transport.servlet.ServletServerInvoker.processRequest(ServletServerInvoker.java:416)
at sun.reflect.GeneratedMethodAccessor183.invoke(Unknown Source)
.
.
.
Caused by: java.lang.IllegalArgumentException: [Warning] Expected string of the form, hostname|jmxPort|nativeTransportPort
at org.rhq.core.util.exception.WrappedRemotingException.getCause(WrappedRemotingException.java:121)
at java.lang.Throwable.printEnclosedStackTrace(Throwable.java:706)
at java.lang.Throwable.printEnclosedStackTrace(Throwable.java:708)

When the RHQ server is restarted after the installation (before tests are launched) tests pass.

Comment 8 Jay Shaughnessy 2013-06-03 15:53:23 UTC

This would indicate that rhq.cassandra.seeds was missing or not properly formed.  

This property is typically set in rhq-server.properties. Check to make sure it is correct.

I have pushed a change to include the invalid property setting in the exception so future builds should show a bit more information.

Comment 9 Filip Brychta 2013-06-03 16:15:06 UTC

(In reply to Jay Shaughnessy from comment #8)
> This would indicate that rhq.cassandra.seeds was missing or not properly
> formed.  
> 
> This property is typically set in rhq-server.properties. Check to make sure
> it is correct.
> 
> I have pushed a change to include the invalid property setting in the
> exception so future builds should show a bit more information.

As i noted in comment 6 "AFAIK it should be caused by incorrect format of 'rhq.cassandra.seeds' property in rhq-server.properties but this property is set correctly. This exception disappears once the rhq server is restarted."

So the problem is not in incorrect property but in the fact, that server didn't reload a configuration after the installation and must be restarted manually. Then the issue disappears..

Comment 10 Filip Brychta 2013-06-06 11:41:24 UTC

I created a separate bz 971377 to distinguish used installation script (rhq-install.sh or rhqctl)

Comment 11 John Sanda 2013-06-06 16:25:40 UTC

I am not sure if the cause here had to due with the rhq.cassandra.seeds property, but the behavior you describe looks like an issue which I am currently addressing. If all of the storage nodes go down for whatever reason, the driver basically becomes stale and won't be able to reconnect when storage nodes come back up. The session object needs to be recycled. I have added logic so that when the server goes into maintenance mode as a result of all storage nodes being down, we also shut down what I am now calling the storage client subsystem. This includes shutting down the driver session. When one or more storage nodes come back online, the storage client subsystem will be reinitialized with a new driver session. You should see log statements like the following when the server goes into maintenance mode,

12:15:01,137 WARN  [org.rhq.enterprise.server.cassandra.CassandraClusterHeartBeatJob] (RHQScheduler_Worker-2) Server[id=10001,name=localhost,address=localhost,port=7080,securePort=7443] is unable to connect to any Cassandra node. Server will go into maintenance mode.
12:15:01,137 INFO  [org.rhq.enterprise.server.cassandra.CassandraClusterHeartBeatJob] (RHQScheduler_Worker-2) Moving Server[id=10001,name=localhost,address=localhost,port=7080,securePort=7443] from NORMAL to MAINTENANCE
12:15:01,146 INFO  [org.rhq.enterprise.server.cassandra.CassandraClusterHeartBeatJob] (RHQScheduler_Worker-2) Preparing to shut down storage client subsystem
12:15:01,147 INFO  [org.rhq.enterprise.server.cassandra.StorageClientManagerBean] (RHQScheduler_Worker-2) Shuttting down storage client subsystem

And when the server comes out of maintenance mode, you should see log statements like,

12:25:01,161 INFO  [org.rhq.enterprise.server.cassandra.CassandraClusterHeartBeatJob] (RHQScheduler_Worker-3) Moving Server[id=10001,name=localhost,address=localhost,port=7080,securePort=7443] from MAINTENANCE to NORMAL
12:25:01,168 INFO  [org.rhq.enterprise.server.cassandra.CassandraClusterHeartBeatJob] (RHQScheduler_Worker-3) Restarting storage client subsystem...
12:25:01,168 INFO  [org.rhq.enterprise.server.cassandra.StorageClientManagerBean] (RHQScheduler_Worker-3) Initializing storage client subsystem
12:25:01,546 INFO  [org.rhq.server.metrics.MetricsDAO] (RHQScheduler_Worker-3) Initializing prepared statements
12:25:01,564 INFO  [org.rhq.server.metrics.MetricsDAO] (RHQScheduler_Worker-3) Finished initializing prepared statements in 18 ms
12:25:01,565 INFO  [org.rhq.enterprise.server.cassandra.StorageClientManagerBean] (RHQScheduler_Worker-3) Storage client subsystem is now initialized

Comment 12 John Sanda 2013-06-06 18:31:44 UTC

These changes are in build 2295 or later of the rhq-master Jenkins job. Moving to ON_QA so this can be retested.

Comment 13 Filip Brychta 2013-06-07 09:02:39 UTC

Verified that installer exception is not there any more.
Version: 4.8.0-SNAPSHOT
Build Number: bf44b14

Comment 14 Armine Hovsepyan 2013-06-07 15:48:15 UTC

sorry, have to reopen the issue.
rhq-install still has errors in log (error is attached)

Comment 15 Filip Brychta 2013-06-10 09:31:44 UTC

I tried that again and it works for me. I checked with Armine that we used the same version of RHQ. We use the same architecture and java version so there is some magic going on.
Armine:
uname -a
Linux rhqserver.bc.jonqe.lab.eng.bos.redhat.com 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
 java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.1) (rhel-1.45.1.11.1.el6-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Mine:
uname -a
Linux fbry-test2.bc.jonqe.lab.eng.bos.redhat.com 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.1) (rhel-1.45.1.11.1.el6-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Comment 17 Stefan Negrea 2013-08-22 22:26:24 UTC

Please re-test. The storage node code changed significantly since the bug was reported. Currently the storage cluster availability no longer uses the JMX port but rather events from the CQL driver. The problem should be resolved.

Comment 18 Armine Hovsepyan 2013-08-26 10:55:10 UTC

Created attachment 790421 [details]
rhq-storage-installer.png

Comment 19 Armine Hovsepyan 2013-08-26 10:56:14 UTC

reopening

rhq-storage-installer.sh runs with Exit 1 - storage not being installed - please get screen-shot attached.

Comment 20 Stefan Negrea 2013-08-26 21:17:51 UTC

Comment 18 does not look related to the initial JMX parsing issues. Please open a different BZ if this is an issue. 

Also, please review and attach the logs for both the storage installer (rhq-storage-installer.log) and the storage server (rhq-storage.log) for the cause of the error.

Comment 21 Armine Hovsepyan 2013-08-27 08:18:16 UTC

closing this bug as verified since rhq.sh is removed and rhqctl install works as an equivalent. 

bug related to comment 18 is #972602

Note You need to log in before you can comment on or make changes to this bug.