Bug 879583

Summary: Monitoring : Platform plugin "Process" service reports wrong availability (SIGAR)
Product: [Other] RHQ Project Reporter: Thomas Segismont <tsegismo>
Component: AgentAssignee: Thomas Segismont <tsegismo>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: hrupp, jshaughn, lkrejci
Target Milestone: ---   
Target Release: RHQ 4.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 879639 (view as bug list) Environment:
Last Closed: 2013-09-03 14:41:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 879639    

Description Thomas Segismont 2012-11-23 11:38:37 UTC
Description of problem:
When monitoring a process, status update may be wrong for an arbitrary number of availability checks.


Version-Release number of selected component (if applicable):
4.6.0-SNAPSHOT

How reproducible:
Always

Steps to Reproduce:
1.Start a test process on an monitored machine (e.g. LibreOffice)
2.For the monitored machine, import a new child resource in RHQ with type process (e.g. with PIQL process|basename|match=^soffice.*)
3.When the process availability shown is "UP", close/kill the test process

  
Actual results:
The availability status is still "UP" for a time longer than the availability check interval.

Expected results:
The availability status should be "DOWN" as soon as the availability check interval has elapsed.


Additional info:
In the ProcessInfo class, the method isRunning uses the SIGAR class ProcState. If the process has been killed or shutdown, the instance of ProcState contains stale data.

Comment 1 Thomas Segismont 2012-11-23 13:30:39 UTC
In ProcessComponent class, the ProcessInfo instance is refreshed each time a metric collection is made.

So after a metric collection, the next availability check has fresh data to process.

This could explain why, after some time, the closed/killed process is eventually reported "DOWN".

Comment 2 Thomas Segismont 2012-11-26 10:06:46 UTC
ProcessInfo instance is now refreshed on every availabilty check.

master 5c4217e

Comment 3 Lukas Krejci 2012-11-26 15:17:19 UTC
I think the fix is not enough for pid file based processes, because the pid file is never re-read until the agent / plugin container is restarted and the component re-started.

Comment 4 Thomas Segismont 2012-11-27 16:30:52 UTC
Lukas,

As discussed with you on IRC, the problem is not the discovery type of the process component.

The problem is that with the first fix, after the refresh it's too late to see if the process has not yet been restarted.

So I remixed the fix.

master 2ec8d54

Comment 5 Lukas Krejci 2012-11-27 17:01:33 UTC
This looks correct to me.

Comment 6 Heiko W. Rupp 2013-09-03 14:41:46 UTC
Bulk closing of issues in old RHQ releases that are in production for a while now.

Please open a new issue when running into an issue.