Bug 1009658

Summary: Agent will not shutdown gracefully and is being forcefully killed
Product: [JBoss] JBoss Operations Network Reporter: Larry O'Leary <loleary>
Component: AgentAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: ahovsepy, hrupp, tsegismo
Target Milestone: ER02   
Target Release: JON 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Larry O'Leary 2013-09-18 19:43:45 UTC
Description of problem:
When attempting to shutdown a JBoss ON agent, the agent takes a long while to actually shutdown. When it finally shuts-down it is due to the process being killed.

Version-Release number of selected component (if applicable):
4.7.0.JON [adad71f]

How reproducible:
Always

Steps to Reproduce:
1.  Start JBoss ON system:

        ./rhqctl start

2.  After system is up and running, shutdown system:

        ./rhqctl stop

Actual results:
Agent takes a couple minutes to actually stop and the log contains the following message repeated every few seconds:

2013-09-18 14:35:47,766 DEBUG [Thread-6] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.thread-is-still-active}Thread [pool-3-thread-1] is still alive - its stack trace follows:
java.lang.Throwable: Thread [pool-3-thread-1]
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
	at java.util.concurrent.DelayQueue.take(DelayQueue.java:193)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:636)

2013-09-18 14:35:47,767 INFO  [Thread-6] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [1] threads to die

Finally followed by:

2013-09-18 14:36:17,774 INFO  [Thread-6] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.no-more-wait}[1] threads are not dying - agent will not wait anymore
2013-09-18 14:36:17,774 INFO  [Thread-6] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.threads-still-alive}There are still [1] threads left - the kill thread will
exit the VM shortly if these threads do not die 
2013-09-18 14:36:17,774 INFO  [Thread-6] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.exit.shutdown-complete}Shutdown complete - agent will now exit.


Expected results:
Agent should shutdown right away and not be forcefully killed.

Comment 1 Thomas Segismont 2013-09-19 09:20:13 UTC
Looks like the same error reported in BZ1008570, but the commit which introduced the problem described by BZ1008570 was not shipped in 4.7 but in 4.9

Master has been patched:

commit a9feadf7e8b1fbc5215dd7ba7e6cc4f1a4e78cc8
Author: Thomas Segismont <tsegismo>
Date:   Tue Sep 17 12:51:30 2013 +0200

And now the agents stops without blocking.

Comment 2 Armine Hovsepyan 2013-10-02 13:32:09 UTC
verified.

I don't see "kill thread" in logs anymore while stopping agent. 
Fragment from agent.log -> http://pastebin.test.redhat.com/167611