Bug 1008570 - Agent shutdown hangs because one thread is still alive
Summary: Agent shutdown hangs because one thread is still alive
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: RHQ 4.10
Assignee: Thomas Segismont
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-16 15:51 UTC by Thomas Segismont
Modified: 2017-02-02 07:19 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-04-23 12:31:22 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1009658 0 unspecified CLOSED Agent will not shutdown gracefully and is being forcefully killed 2021-02-22 00:41:40 UTC

Internal Links: 1009658

Description Thomas Segismont 2013-09-16 15:51:08 UTC
Description of problem:
Agent shutdown hangs because of one thread still alive

Version-Release number of selected component (if applicable):
4.9

Additional info:

Reported by community user
https://community.jboss.org/message/837538

The agent seems to wait for a thread that's a scheduled executor:
"pool-3-thread-1" prio=10 tid=0x00007fe78c4c0800 nid=0x192 waiting on condition [0x00007fe788126000]  
   java.lang.Thread.State: TIMED_WAITING (parking)  
        at sun.misc.Unsafe.park(Native Method)  
        - parking to wait for  <0x00000000e1309b98> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)  
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)  
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)  
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)  
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)  
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)  
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)  
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)  
        at java.lang.Thread.run(Thread.java:724)  

The scheduled executor is shut down in org.rhq.core.system.SigarAccessHandler#close:

    void close() {
        if (sharedSigar != null) {
            sharedSigarLock.lock();
            try {
                sharedSigar.close();
                sharedSigar = null;
            } finally {
                sharedSigarLock.unlock();
            }
        }
        scheduledExecutorService.shutdownNow();
    }

The problem is org.rhq.core.system.SigarAccessHandler#close never gets called when the agent goes down.

Comment 1 Elias Ross 2013-09-16 18:28:33 UTC
Would be good to set the thread name.

diff --git a/modules/core/native-system/src/main/java/org/rhq/core/system/SigarAccessHandler.java b/modules/core/native-system/src/main/java/org/rhq/core/system/SigarAccessHandle
index a781641..ea8f018 100644
--- a/modules/core/native-system/src/main/java/org/rhq/core/system/SigarAccessHandler.java
+++ b/modules/core/native-system/src/main/java/org/rhq/core/system/SigarAccessHandler.java
@@ -29,6 +29,8 @@
 import java.lang.reflect.Method;
 import java.util.concurrent.Executors;
 import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.ThreadFactory;
+import java.util.concurrent.atomic.AtomicInteger;
 import java.util.concurrent.locks.ReentrantLock;
 
 import org.apache.commons.logging.Log;
@@ -70,6 +72,17 @@
     private static final boolean THREAD_DUMP_ON_SIGAR_INSTANCES_THRESHOLD = Boolean
         .getBoolean("threadDumpOnlocalSigarInstancesWarningThreshold");
 
+    private static final ThreadFactory threadFactory = new ThreadFactory() {
+        final AtomicInteger threadNumber = new AtomicInteger(1);
+
+        @Override
+        public Thread newThread(Runnable r) {
+            Thread t = new Thread(r);
+            t.setName("sigar-" + threadNumber.getAndIncrement());
+            return t;
+        }
+    };
+
     private final SigarFactory sigarFactory;
     private final ReentrantLock sharedSigarLock;
     private final ReentrantLock localSigarLock;
@@ -85,7 +98,7 @@
         this.sigarFactory = sigarFactory;
         sharedSigarLock = new ReentrantLock();
         localSigarLock = new ReentrantLock();
-        scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
+        scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(threadFactory);
         scheduledExecutorService.scheduleWithFixedDelay(new ThresholdChecker(), 1, 5, MINUTES);
         localSigarInstancesCount = 0;
     }

Comment 2 Thomas Segismont 2013-09-17 10:53:43 UTC
Fixed in master

commit a9feadf7e8b1fbc5215dd7ba7e6cc4f1a4e78cc8
Author: Thomas Segismont <tsegismo>
Date:   Tue Sep 17 12:51:30 2013 +0200

Comment 3 Heiko W. Rupp 2014-04-23 12:31:22 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.

Comment 4 XMEN_WAR 2017-02-02 07:19:17 UTC
We are seeing the same issue in 4.13 

2017-02-01 14:55:53,868 INFO  [RHQ Server Polling Thread] (enterprise.communications.command.client.ServerPollingThread)- {ServerPollingThread.server-online}The server has come back online; client has been told to start sending commands again
2017-02-01 14:56:02,816 INFO  [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die
2017-02-01 14:56:22,823 INFO  [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die
2017-02-01 14:56:42,826 INFO  [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die
2017-02-01 14:57:02,828 INFO  [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die
2017-02-01 14:57:22,833 INFO  [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die
2017-02-01 14:57:42,839 INFO  [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die
2017-02-01 14:58:02,842 INFO  [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die
2017-02-01 14:58:22,845 INFO  [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die
2017-02-01 14:58:42,851 INFO  [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die


Note You need to log in before you can comment on or make changes to this bug.