Bug 923400
Summary: | Sigar creates high number of blocked threads (unbounded) if mount is gone | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Elias Ross <genman> |
Component: | Plugins | Assignee: | Thomas Segismont <tsegismo> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.5 | CC: | hrupp |
Target Milestone: | --- | ||
Target Release: | RHQ 4.9 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-03-26 08:32:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Elias Ross
2013-03-19 18:33:25 UTC
Thomas, could you please have a look at this? Can you confirm the root cause of the never ending FileSystemUsage#gather call? What do you mean by "mount is gone"? Basically the filesystem failed on the machine, e.g. ls /mntX would hang on this volume. I'm guessing something similar might happen with an NFS mount that is hanging as well. From rhq-devel ML: Hi, I've worked on a fix for Bug 923400 - Sigar creates high number of blocked threads (unbounded) if mount is gone. The problem is that we use a shared instance of Sigar and threads may be blocked when trying to use it if a call lasts too long or never returns. The fix is in a bug branch: https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=bug/923400 It consists in a behavior change of SigarAccessHandler. Let me paste here the new Javadoc of the class: **** An InvocationHandler for a org.hyperic.sigar.SigarProxy. A single instance of this class will be created by the SigarAccess class. This class holds a shared Sigar instance and serializes calls. If a thread waits more than 'sharedSigarLockMaxWait' seconds, it will be given a new Sigar instance, which will be destroyed at the end of the call. Every 5 minutes, a background task checks that 'sigarInstancesThreshold' has not been exceeded. It it has, a warning message will be logged, optionally with a thread dump. This class is configurable with System properties: * sharedSigarLockMaxWait: maximum time a thread will wait for the shared Sigar lock acquistion; defaults to 5 * sigarInstancesThreshold: threshold of currently living Sigar instances at which the background task will print warning messages; defaults to 50 * threadDumpOnSigarInstancesThreshold: if set to true (case insensitive), the background task will also log a thread dump when sigarInstancesThreshold is met **** This change will not prevent problems like the one Elias reported (a call to #getMountedFileSystemUsage never returning because of a bad FS mount). But it will put the agent in some sort of degraded mode where other calls to Sigar will succeed and warnings will be logged. It should even be possible to fire an alert on this log event if the agent is inventoried. What's your opinion? Thanks and regards, Thomas From rhq-devel ML: One additional change may be to set a limit for the number of on demand Sigar instances and reject calls when this limit is reached. Fixed in master commit fe38a28bd0ba7df967ec8b6a7f5f2b4a6bb839d6 Author: Thomas Segismont <tsegismo> Date: Wed Jul 3 12:27:55 2013 +0200 Bulk closing now that 4.10 is out. If you think an issue is not resolved, please open a new BZ and link to the existing one. |