Red Hat Bugzilla – Bug 879583
Monitoring : Platform plugin "Process" service reports wrong availability (SIGAR)
Last modified: 2013-09-03 10:41:46 EDT
Description of problem:
When monitoring a process, status update may be wrong for an arbitrary number of availability checks.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Start a test process on an monitored machine (e.g. LibreOffice)
2.For the monitored machine, import a new child resource in RHQ with type process (e.g. with PIQL process|basename|match=^soffice.*)
3.When the process availability shown is "UP", close/kill the test process
The availability status is still "UP" for a time longer than the availability check interval.
The availability status should be "DOWN" as soon as the availability check interval has elapsed.
In the ProcessInfo class, the method isRunning uses the SIGAR class ProcState. If the process has been killed or shutdown, the instance of ProcState contains stale data.
In ProcessComponent class, the ProcessInfo instance is refreshed each time a metric collection is made.
So after a metric collection, the next availability check has fresh data to process.
This could explain why, after some time, the closed/killed process is eventually reported "DOWN".
ProcessInfo instance is now refreshed on every availabilty check.
I think the fix is not enough for pid file based processes, because the pid file is never re-read until the agent / plugin container is restarted and the component re-started.
As discussed with you on IRC, the problem is not the discovery type of the process component.
The problem is that with the first fix, after the refresh it's too late to see if the process has not yet been restarted.
So I remixed the fix.
This looks correct to me.
Bulk closing of issues in old RHQ releases that are in production for a while now.
Please open a new issue when running into an issue.