Created attachment 1010623 [details] server and storage logs Description of problem: Storage node is correctly installed and started but it's not possible to stop it via rhqctl. Stop operation hangs and storage node remains in broken state. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. install IBM SDK 8 2. install JON (./rhqctl install) 3. start JON (./rhqctl start) Actual results: After step 2: Stopping RHQ storage node... RHQ storage node (pid=25564) is stopping... 08:43:47,861 ERROR [org.rhq.server.control.RHQControl] Process [25564] did not finish yet. Terminate it manually and retry. After step 3: -all processes (agent, storage node, server) are started but storage node is not accessible. From server.log: 08:58:27,275 WARN [org.rhq.enterprise.server.storage.StorageClientManager] (pool-6-thread-1) Storage client subsystem wasn't initialized because it wasn't possible to connect to the storage cluster. The RHQ server is set to MAINTENANCE mode. Please start the storage cluster as soon as possible.: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: fbr-ibm8.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.134 ([fbr-ibm8.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.134] Cannot connect)) Expected results: Everything is correctly started after step 2 and no errors after step 3 Additional info: Workaround: kill -9 <storageNode PID> ./rhqctl start Next invocation of ./rhqctl stop breaks storage node again. Logs are attached.
The issue is not visible on IBM 1.7.0
I am going to implement the workaround to rhqctl, ie. if we attempt to stop storage node and we detect it still runs after some time, we'll kill it by sending SIGTERM signal (kill -9 ), I'll also increase the time of waiting for cassandra proper shutdown to 1 minute.
branch: master link: https://github.com/rhq-project/rhq/commit/efdaa36c7 time: 2015-04-15 13:18:30 +0200 commit: efdaa36c765582e030f5be652af2394851fe285e author: Libor Zoubek - lzoubek message: Bug 1208854 - Unable to stop storage node when running on IBM SDK 8 Fix rhqctl to kill with SIGKILL when we do not succeed to stop cassandra the safe way
qe payload verification process to include log files documenting correct shutdown. include the log files and attach to this issue.
Available for test with 3.3.3 ER01 build: https://brewweb.devel.redhat.com/buildinfo?buildID=446732 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of jon-server-3.3.0.GA-update-03.zip.
Verified on Version : 3.3.0.GA Update 03 Build Number : e4b348a:2f80c8c
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1525.html