Bug 841399

Summary: Failing NativeProcessRetrievalTest
Product: [Other] RHQ Project Reporter: Heiko W. Rupp <hrupp>
Component: Plugin ContainerAssignee: RHQ Project Maintainer <rhq-maint>
Status: NEW --- QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.4CC: hrupp, lkrejci
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Heiko W. Rupp 2012-07-18 17:17:08 EDT
I see those tests failing on jdk6 and 7 on OS/X

Failed tests:   testProcessInfoAccurateAfterProcessRestart(org.rhq.core.pc.inventory.getnativeprocess.NativeProcessRetrievalTest): The process info should have refreshed, before= 30790, after= 30790
  testProcessInfoAccurateAfterProcessStarted(org.rhq.core.pc.inventory.getnativeprocess.NativeProcessRetrievalTest): The process info should have been nulled out expected:<0> but was:<30790>
  testProcessInfoAccurateWhenProcessStopped(org.rhq.core.pc.inventory.getnativeprocess.NativeProcessRetrievalTest): The process info should have refreshed expected:<0> but was:<30790>


It looks to me like inside ProcessInfo when we detect that a process is dead, we still return a pid for it, but throw lots of warnings to the log.

Fixing that yields

Failed tests:   testProcessInfoAccurateAfterProcessRestart(org.rhq.core.pc.inventory.getnativeprocess.NativeProcessRetrievalTest): Only a single discovery call should have been made to refresh the process info expected:<3> but was:<2>
  testProcessInfoAccurateAfterProcessStarted(org.rhq.core.pc.inventory.getnativeprocess.NativeProcessRetrievalTest): Exactly 1 discovery call should have been made to refresh the process info after the process started again. expected:<4> but was:<3>

From stepping through it, I only see the discovery component being called at the start of the method, but not after the 
stopProcess()
startProcess() 
combo.
Comment 1 Lukas Krejci 2012-07-19 12:16:34 EDT
These tests check the behavior of ResourceContext.getNativeProcess().

That method should, in case when it finds the original process no longer running, do a fresh process scan and run the discovery with that scan. If the discovery scan finds a resource with the same resource key as the current resource, the corresponding process info is cached in the resource context and returned to the caller.

The tests take advantage of that behavior to track if the method actually found the original process dead and refreshed.

So if you're not seeing the correct number of discoveries, I'd assume that Sigar is not reporting the process as expected or your fix (what it actually is?) changed the behavior of ProcessInfo.isRunning() and ProcessInfo.refresh() (see ResourceContext.isRediscoveryRequired() method on how we determine we need a rediscovery).
Comment 2 Lukas Krejci 2012-07-30 10:37:13 EDT
The tests seem to work with both java6 and 7 on Linux. I assume this to be a problem with Sigar and OS X.

Note that the NativeProcessRetrievalTest#stopTestProcess() has a hardcoded wait of 2s to give Sigar enough time to detect the process info changes. Maybe this value needs to be greater on OS X? But again this is a platform specific issue that I can't help you with having no access to that platform.
Comment 3 Lukas Krejci 2012-07-30 10:38:26 EDT
Btw. what you meant by "Fixing that yields"?
Comment 4 Heiko W. Rupp 2012-07-30 10:52:55 EDT
The timeout change did not change behavior.

If this is an OS/X specific issue (which may be possible), then we need to make sure, this test only runs on non-OS/X platforms.

Have a log at the logfile when the test runs - some of the discovery code just does not re-try running a scan, as the pid is >0, so it is assumed to be valid.

E.g. ProciesInfo.update() does

            try {
                procExe = sigar.getProcExe(pid);
            } catch (Exception e) {
                handleSigarCallException(e, "getProcExe");
            }
and then just continues.
If we end up in the catch block, the underlying process is gone.
Still we continue as nothing had happened. Which is not true and the log afterwards is full of such error messages.


What I meant with "yields" is that once I fixed the above and got the native stuff into recognizing that when a process no longer exists and is setting the pid to 0 (which triggers a scan on next call), I get the issues with the discovery count, as I do not see this discovery being called between stop() and start() calls.
Comment 5 Lukas Krejci 2012-07-30 11:12:01 EDT
Hmm... the code in ProcessInfo is non-trivial and not having access to your changes nor having the ability to reproduce, I can't say what's going on exactly.

As I said already, you need to look at ResourceContext.isRediscoveryRequired() - this is where the code decides whether to reinvoke the discovery to align with the latest process scan or not.