Bug 1208854

Summary: Unable to stop storage node when running on IBM SDK 8
Product: [JBoss] JBoss Operations Network Reporter: Filip Brychta <fbrychta>
Component: Storage NodeAssignee: Libor Zoubek <lzoubek>
Status: CLOSED ERRATA QA Contact: Filip Brychta <fbrychta>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.3.0CC: loleary, lzoubek, mfoley, spinder, theute
Target Milestone: ER01Keywords: Triaged
Target Release: JON 3.3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-30 16:41:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1089495    
Attachments:
Description Flags
server and storage logs none

Description Filip Brychta 2015-04-03 13:14:04 UTC
Created attachment 1010623 [details]
server and storage logs

Description of problem:
Storage node is correctly installed and started but it's not possible to stop it via rhqctl. Stop operation hangs and storage node remains in broken state.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. install IBM SDK 8
2. install JON (./rhqctl install)
3. start JON (./rhqctl start)


Actual results:
After step 2:
Stopping RHQ storage node...
RHQ storage node (pid=25564) is stopping...
08:43:47,861 ERROR [org.rhq.server.control.RHQControl] Process [25564] did not finish yet. Terminate it manually and retry.

After step 3:
-all processes (agent, storage node, server) are started but storage node is not accessible. From server.log:
08:58:27,275 WARN  [org.rhq.enterprise.server.storage.StorageClientManager] (pool-6-thread-1) Storage client subsystem wasn't initialized because it wasn't possible to connect to the storage cluster. The RHQ server is set to MAINTENANCE mode. Please start the storage cluster as soon as possible.: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: fbr-ibm8.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.134 ([fbr-ibm8.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.134] Cannot connect))


Expected results:
Everything is correctly started after step 2 and no errors after step 3

Additional info:
Workaround:
kill -9 <storageNode PID>
./rhqctl start

Next invocation of ./rhqctl stop breaks storage node again.
Logs are attached.

Comment 1 Filip Brychta 2015-04-09 18:36:48 UTC
The issue is not visible on IBM 1.7.0

Comment 2 Libor Zoubek 2015-04-14 13:18:10 UTC
I am going to implement the workaround to rhqctl, ie. if we attempt to stop storage node and we detect it still runs after some time, we'll kill it by sending SIGTERM signal (kill -9 ), I'll also increase the time of waiting for cassandra proper shutdown to 1 minute.

Comment 3 Libor Zoubek 2015-04-15 11:19:58 UTC
branch:  master
link:    https://github.com/rhq-project/rhq/commit/efdaa36c7
time:    2015-04-15 13:18:30 +0200
commit:  efdaa36c765582e030f5be652af2394851fe285e
author:  Libor Zoubek - lzoubek
message: Bug 1208854 - Unable to stop storage node when running on IBM SDK 8
         Fix rhqctl to kill with SIGKILL when we do not succeed to stop
         cassandra the safe way

Comment 6 Mike Foley 2015-04-16 14:18:35 UTC
qe payload verification process to include log files documenting correct shutdown.  include the log files and attach to this issue.

Comment 15 Simeon Pinder 2015-07-10 18:55:33 UTC
Available for test with 3.3.3 ER01 build: 
https://brewweb.devel.redhat.com/buildinfo?buildID=446732
 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of
 jon-server-3.3.0.GA-update-03.zip.

Comment 16 Filip Brychta 2015-07-14 10:02:38 UTC
Verified on
Version :	
3.3.0.GA Update 03
Build Number :	
e4b348a:2f80c8c

Comment 18 errata-xmlrpc 2015-07-30 16:41:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1525.html