Description of problem: Agent shutdown hangs because of one thread still alive Version-Release number of selected component (if applicable): 4.9 Additional info: Reported by community user https://community.jboss.org/message/837538 The agent seems to wait for a thread that's a scheduled executor: "pool-3-thread-1" prio=10 tid=0x00007fe78c4c0800 nid=0x192 waiting on condition [0x00007fe788126000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000e1309b98> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) The scheduled executor is shut down in org.rhq.core.system.SigarAccessHandler#close: void close() { if (sharedSigar != null) { sharedSigarLock.lock(); try { sharedSigar.close(); sharedSigar = null; } finally { sharedSigarLock.unlock(); } } scheduledExecutorService.shutdownNow(); } The problem is org.rhq.core.system.SigarAccessHandler#close never gets called when the agent goes down.
Would be good to set the thread name. diff --git a/modules/core/native-system/src/main/java/org/rhq/core/system/SigarAccessHandler.java b/modules/core/native-system/src/main/java/org/rhq/core/system/SigarAccessHandle index a781641..ea8f018 100644 --- a/modules/core/native-system/src/main/java/org/rhq/core/system/SigarAccessHandler.java +++ b/modules/core/native-system/src/main/java/org/rhq/core/system/SigarAccessHandler.java @@ -29,6 +29,8 @@ import java.lang.reflect.Method; import java.util.concurrent.Executors; import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ThreadFactory; +import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.locks.ReentrantLock; import org.apache.commons.logging.Log; @@ -70,6 +72,17 @@ private static final boolean THREAD_DUMP_ON_SIGAR_INSTANCES_THRESHOLD = Boolean .getBoolean("threadDumpOnlocalSigarInstancesWarningThreshold"); + private static final ThreadFactory threadFactory = new ThreadFactory() { + final AtomicInteger threadNumber = new AtomicInteger(1); + + @Override + public Thread newThread(Runnable r) { + Thread t = new Thread(r); + t.setName("sigar-" + threadNumber.getAndIncrement()); + return t; + } + }; + private final SigarFactory sigarFactory; private final ReentrantLock sharedSigarLock; private final ReentrantLock localSigarLock; @@ -85,7 +98,7 @@ this.sigarFactory = sigarFactory; sharedSigarLock = new ReentrantLock(); localSigarLock = new ReentrantLock(); - scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(); + scheduledExecutorService = Executors.newSingleThreadScheduledExecutor(threadFactory); scheduledExecutorService.scheduleWithFixedDelay(new ThresholdChecker(), 1, 5, MINUTES); localSigarInstancesCount = 0; }
Fixed in master commit a9feadf7e8b1fbc5215dd7ba7e6cc4f1a4e78cc8 Author: Thomas Segismont <tsegismo> Date: Tue Sep 17 12:51:30 2013 +0200
Bulk closing of 4.10 issues. If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.
We are seeing the same issue in 4.13 2017-02-01 14:55:53,868 INFO [RHQ Server Polling Thread] (enterprise.communications.command.client.ServerPollingThread)- {ServerPollingThread.server-online}The server has come back online; client has been told to start sending commands again 2017-02-01 14:56:02,816 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die 2017-02-01 14:56:22,823 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die 2017-02-01 14:56:42,826 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die 2017-02-01 14:57:02,828 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die 2017-02-01 14:57:22,833 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die 2017-02-01 14:57:42,839 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die 2017-02-01 14:58:02,842 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die 2017-02-01 14:58:22,845 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die 2017-02-01 14:58:42,851 INFO [RHQ Agent Update Thread] (org.rhq.enterprise.agent.AgentShutdownHook)- {AgentShutdownHook.wait}The agent will wait for [2] threads to die