This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 879583 - Monitoring : Platform plugin "Process" service reports wrong availability (SIGAR)
Monitoring : Platform plugin "Process" service reports wrong availability (SI...
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Agent (Show other bugs)
unspecified
Unspecified Unspecified
unspecified Severity unspecified (vote)
: ---
: RHQ 4.6
Assigned To: Thomas Segismont
Mike Foley
:
Depends On:
Blocks: 879639
  Show dependency treegraph
 
Reported: 2012-11-23 06:38 EST by Thomas Segismont
Modified: 2013-09-03 10:41 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 879639 (view as bug list)
Environment:
Last Closed: 2013-09-03 10:41:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Thomas Segismont 2012-11-23 06:38:37 EST
Description of problem:
When monitoring a process, status update may be wrong for an arbitrary number of availability checks.


Version-Release number of selected component (if applicable):
4.6.0-SNAPSHOT

How reproducible:
Always

Steps to Reproduce:
1.Start a test process on an monitored machine (e.g. LibreOffice)
2.For the monitored machine, import a new child resource in RHQ with type process (e.g. with PIQL process|basename|match=^soffice.*)
3.When the process availability shown is "UP", close/kill the test process

  
Actual results:
The availability status is still "UP" for a time longer than the availability check interval.

Expected results:
The availability status should be "DOWN" as soon as the availability check interval has elapsed.


Additional info:
In the ProcessInfo class, the method isRunning uses the SIGAR class ProcState. If the process has been killed or shutdown, the instance of ProcState contains stale data.
Comment 1 Thomas Segismont 2012-11-23 08:30:39 EST
In ProcessComponent class, the ProcessInfo instance is refreshed each time a metric collection is made.

So after a metric collection, the next availability check has fresh data to process.

This could explain why, after some time, the closed/killed process is eventually reported "DOWN".
Comment 2 Thomas Segismont 2012-11-26 05:06:46 EST
ProcessInfo instance is now refreshed on every availabilty check.

master 5c4217e
Comment 3 Lukas Krejci 2012-11-26 10:17:19 EST
I think the fix is not enough for pid file based processes, because the pid file is never re-read until the agent / plugin container is restarted and the component re-started.
Comment 4 Thomas Segismont 2012-11-27 11:30:52 EST
Lukas,

As discussed with you on IRC, the problem is not the discovery type of the process component.

The problem is that with the first fix, after the refresh it's too late to see if the process has not yet been restarted.

So I remixed the fix.

master 2ec8d54
Comment 5 Lukas Krejci 2012-11-27 12:01:33 EST
This looks correct to me.
Comment 6 Heiko W. Rupp 2013-09-03 10:41:46 EDT
Bulk closing of issues in old RHQ releases that are in production for a while now.

Please open a new issue when running into an issue.

Note You need to log in before you can comment on or make changes to this bug.