Bug 1208854 - Unable to stop storage node when running on IBM SDK 8
Summary: Unable to stop storage node when running on IBM SDK 8
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Storage Node
Version: JON 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ER01
: JON 3.3.3
Assignee: Libor Zoubek
QA Contact: Filip Brychta
URL:
Whiteboard:
Depends On:
Blocks: 1089495
TreeView+ depends on / blocked
 
Reported: 2015-04-03 13:14 UTC by Filip Brychta
Modified: 2015-11-02 00:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-07-30 16:41:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
server and storage logs (88.60 KB, application/x-gzip)
2015-04-03 13:14 UTC, Filip Brychta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1525 0 normal SHIPPED_LIVE Moderate: Red Hat JBoss Operations Network 3.3.3 update 2015-07-30 20:41:08 UTC

Description Filip Brychta 2015-04-03 13:14:04 UTC
Created attachment 1010623 [details]
server and storage logs

Description of problem:
Storage node is correctly installed and started but it's not possible to stop it via rhqctl. Stop operation hangs and storage node remains in broken state.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. install IBM SDK 8
2. install JON (./rhqctl install)
3. start JON (./rhqctl start)


Actual results:
After step 2:
Stopping RHQ storage node...
RHQ storage node (pid=25564) is stopping...
08:43:47,861 ERROR [org.rhq.server.control.RHQControl] Process [25564] did not finish yet. Terminate it manually and retry.

After step 3:
-all processes (agent, storage node, server) are started but storage node is not accessible. From server.log:
08:58:27,275 WARN  [org.rhq.enterprise.server.storage.StorageClientManager] (pool-6-thread-1) Storage client subsystem wasn't initialized because it wasn't possible to connect to the storage cluster. The RHQ server is set to MAINTENANCE mode. Please start the storage cluster as soon as possible.: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: fbr-ibm8.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.134 ([fbr-ibm8.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.134] Cannot connect))


Expected results:
Everything is correctly started after step 2 and no errors after step 3

Additional info:
Workaround:
kill -9 <storageNode PID>
./rhqctl start

Next invocation of ./rhqctl stop breaks storage node again.
Logs are attached.

Comment 1 Filip Brychta 2015-04-09 18:36:48 UTC
The issue is not visible on IBM 1.7.0

Comment 2 Libor Zoubek 2015-04-14 13:18:10 UTC
I am going to implement the workaround to rhqctl, ie. if we attempt to stop storage node and we detect it still runs after some time, we'll kill it by sending SIGTERM signal (kill -9 ), I'll also increase the time of waiting for cassandra proper shutdown to 1 minute.

Comment 3 Libor Zoubek 2015-04-15 11:19:58 UTC
branch:  master
link:    https://github.com/rhq-project/rhq/commit/efdaa36c7
time:    2015-04-15 13:18:30 +0200
commit:  efdaa36c765582e030f5be652af2394851fe285e
author:  Libor Zoubek - lzoubek
message: Bug 1208854 - Unable to stop storage node when running on IBM SDK 8
         Fix rhqctl to kill with SIGKILL when we do not succeed to stop
         cassandra the safe way

Comment 6 Mike Foley 2015-04-16 14:18:35 UTC
qe payload verification process to include log files documenting correct shutdown.  include the log files and attach to this issue.

Comment 15 Simeon Pinder 2015-07-10 18:55:33 UTC
Available for test with 3.3.3 ER01 build: 
https://brewweb.devel.redhat.com/buildinfo?buildID=446732
 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of
 jon-server-3.3.0.GA-update-03.zip.

Comment 16 Filip Brychta 2015-07-14 10:02:38 UTC
Verified on
Version :	
3.3.0.GA Update 03
Build Number :	
e4b348a:2f80c8c

Comment 18 errata-xmlrpc 2015-07-30 16:41:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1525.html


Note You need to log in before you can comment on or make changes to this bug.