Bug 958087

Summary: RHQ Controll - rhqctl stop --agent removed agent.pid but doesn't stop process
Product: [JBoss] JBoss Operations Network Reporter: Armine Hovsepyan <ahovsepy>
Component: InstallerAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED WORKSFORME QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: high    
Version: JON 3.2CC: jsanda, loleary, mfoley, tsegismo
Target Milestone: ---   
Target Release: JON 3.2.0   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-13 14:27:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rhqctl_start_stop_hangs.png
none
rhqctl_stop_agent.png
none
rhqctl_stop_agent_agent.log
none
rhqctl_stop_agent_server.log none

Description Armine Hovsepyan 2013-04-30 11:12:12 UTC
Description of problem:
RHQ Controll - rhqctl stop --agent removed agent.pid but doesn't stop process

Version-Release number of selected component (if applicable):
jenkins build 177 

How reproducible:
always

Steps to Reproduce:
1. run ./rhqctl install --storage 
2. run ./rhqctl stop --agent
3. run ./rhqctl start --agent
  
Actual results:
After step 1 both agent and storage are installed and running
After step 2 agent process is running while the pid file is removed
After step 3 no new agent is started  - log is "INFO  [org.jboss.modules] JBoss Modules version 1.1.1.GA" 

Expected results:
After step 1 both agent and storage are installed and running
After step 2 agent process is stopped and the pid file is removed
After step 3 agent is started  -  RHQ Agent (pid {number} running" message.

Additional info:
http://jenkins.jonqe.lab.eng.bos.redhat.com:9080/job/RHQ_Control_Run/44/consoleFull  --  please search for rhqctl start --agent

Comment 1 John Sanda 2013-05-17 14:49:37 UTC
I am seeing this issue consistently. I tested a master build, and did not see the issue there. I think the problem is in the Cassandra plugin. It uses the Cassandra CQL driver which which uses its own internal thread pool. I think that the agent was hanging because the driver threads were still running. I updated the CassandraNodeComponent class to implement the ResourceComponent.shutdown method where it shuts down the driver's thread pool. With this change, my agent exited gracefully. This change will be available in Jenkins build 225 and later.

Comment 2 Armine Hovsepyan 2013-07-02 11:36:54 UTC
Hi,

I am not sure if the steps I took are not too rapid, but rhqctl stop either throws exception at the end of the process or is hanging while stopping agent.

Please get attached log of all actions a logs taken.

Moving back to ON_Dev

Comment 3 Armine Hovsepyan 2013-07-02 11:37:37 UTC
Created attachment 767692 [details]
rhqctl_start_stop_hangs.png

Comment 4 John Sanda 2013-07-11 13:47:20 UTC
Armine, I am not able to reproduce this issue. It may be due to the plugins (or rather the lack of plugins) that I am running. I typically test with a minimal set of plugins. When I previously commented on this issue, there cassandra plugin was not shutting down the datastax driver. That was a plugin-specific issue. There could be another plugin that is causing problems with the shutdown. If you see the issue again, can you provide the list of plugins you are running?

Comment 5 Armine Hovsepyan 2013-07-16 09:13:12 UTC
Hi John,

I have installed rhq using rhqctl install, inventoried agent to server gui, called rhqctl stop --agent and it cannot shut down agent for ~15 mins.

Please get attached screen-shot of rhqctl stop and fragments from server and agent logs.

Comment 6 Armine Hovsepyan 2013-07-16 09:13:55 UTC
Created attachment 774124 [details]
rhqctl_stop_agent.png

Comment 7 Armine Hovsepyan 2013-07-16 09:14:26 UTC
Created attachment 774125 [details]
rhqctl_stop_agent_agent.log

Comment 8 Armine Hovsepyan 2013-07-16 09:15:04 UTC
Created attachment 774126 [details]
rhqctl_stop_agent_server.log

Comment 10 John Sanda 2013-08-06 19:31:50 UTC
I am removing this from the Cassandra tracker and removing me as the assignee since the issue is not specific to rhqctl or the Cassandra feature work.

Comment 11 Thomas Segismont 2013-09-13 14:27:16 UTC
Armine,

I'm closing the issue because we don't have enough elements to analyze it.

If you manage to reproduce the problem, then please take a thread dump of the agent and reopen the BZ.

Thanks