This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 871606 - rhq-script-plugin does not always capture process output; causes spurious failures
rhq-script-plugin does not always capture process output; causes spurious fai...
Status: CLOSED CURRENTRELEASE
Product: JBoss Operations Network
Classification: JBoss
Component: Plugin -- Other (Show other bugs)
JON 3.1.1
Unspecified Unspecified
high Severity high
: ---
: JON 3.1.2
Assigned To: John Mazzitelli
Mike Foley
:
Depends On: 849394 875120
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-30 16:38 EDT by Charles Crouch
Modified: 2015-02-01 18:28 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 849394
Environment:
Last Closed: 2013-09-13 10:30:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Charles Crouch 2012-10-30 16:38:55 EDT
+++ This bug was initially created as a clone of Bug #849394 +++

Description of problem:

I configured the 'script' plugin to watch for output when checking for availability of my resource.

I check the results using a regex check. In some cases, the output is empty. Then RHQ assumes the resource is down, when in fact it is up.

For example, /sbin/service is called with the name of my service. In some cases no output appears.

2012-08-18 21:09:38,052 DEBUG [ResourceContainer.invoker.daemon-115] (org.rhq.plugins.script.ScriptServerComponent)- [/sbin/service]: CLI results: exitcode=[0]; error=[null]; output=
2012-08-18 21:09:38,052 DEBUG [ResourceContainer.invoker.daemon-115] (org.rhq.plugins.script.ScriptServerComponent)- [/sbin/service]: CLI output [] did not match regex [(?s).*is running.*], resource is considered DOWN

Version-Release number of selected component (if applicable):

RHQ 4.2 plugin

How reproducible:

Eventually is.

Additional info:

modules/core/util/src/main/java/org/rhq/core/util/exec/ProcessExecutor.java contains this unfortunate note:

        // WARNING:
        // It seems there is no way to get around a possible race condition - what if the process
        // was so fast that it exited already?  We didn't get a chance to capture its output.
        // I see a unit test that periodically fails because it doesn't get any captured output when
        // it should - I think it is because of this race condition.  But there is no Java API that
        // let's me redirect a process' streams before the process is told to start.

It's unclear if this is really still the case or not...

http://stackoverflow.com/questions/1670798/using-a-thread-to-capture-process-output

There is a note that I/O needs to be consumed in a thread. It could be the process completes but the I/O was never consumed.

--- Additional comment from genman@noderunner.net on 2012-08-19 02:15:08 EDT ---


The problem is that the I/O threads are never run to completion: The main process waits for the exit status, but the I/O threads may not have starts.

I will provide a patch, but the solution is simply to return a handle containing both stdout/stderr Thread instances and wait for these threads...

modules/core/util/src/main/java/org/rhq/core/util/exec/ProcessExecutor.java

    /**
     * Wrapper for threads used for capturing output.
     * Call {@link #join} to wait for output to be fully captured.
     */
    protected static class RedirectThreads {

        private final StreamRedirector stdout;
        private final StreamRedirector stderr;

...

    protected RedirectThreads redirectStreams(ProcessToStart process, Process childProcess) throws IOException {
...
        return new RedirectThreads(stdoutThread, stderrThread);

then ... 

+        final RedirectThreads redirect = redirectStreams(process, childProcess);
...
-                        int exitCode = childProcess.waitFor();
-                        retExitCode[0] = new Integer(exitCode);
+                    } finally {
+                        redirect.join();
                     }

Simple as this.

--- Additional comment from genman@noderunner.net on 2012-09-13 18:02:49 EDT ---

Created attachment 612648 [details]
fixes process leak

--- Additional comment from mazz@redhat.com on 2012-09-18 15:19:42 EDT ---

(In reply to comment #2)
> Created attachment 612648 [details]
> fixes process leak

I think the wrong patch was attached - the attached patch deals with PIQL queries. Can you remove that patch and attach the one for this issue?

--- Additional comment from genman@noderunner.net on 2012-09-18 17:07:02 EDT ---

Created attachment 614123 [details]
Ensure output thread completes before exiting startProgram

Sorry I had the wrong patch.

One thing you might want to adjust (for sake of the build) is adjust the number of times we check for correct output:

modules/core/util/src/test/java/org/rhq/core/util/exec/ProcessExecTest.java

diff --git a/modules/core/util/src/test/java/org/rhq/core/util/exec/ProcessExecTest.java b/modules/core/util/src/test/java/org/rhq/core/util/exec/ProcessExecTest.java
index 49ab3f5..60b7aed 100644
--- a/modules/core/util/src/test/java/org/rhq/core/util/exec/ProcessExecTest.java
+++ b/modules/core/util/src/test/java/org/rhq/core/util/exec/ProcessExecTest.java
@@ -35,7 +35,7 @@
 public class ProcessExecTest {
     public void testProcessExecOutputStream() {
         // run multiple times to ensure race condition fixed
-        for (int i = 0; i < 100; i++) {
+        for (int i = 0; i < 10; i++) {
             ProcessToStart start = new ProcessToStart();

             setupProgram(start);

I also have a version with thread pools being attached. I don't think a thread pool is worth it, but might be good idea.

--- Additional comment from genman@noderunner.net on 2012-09-18 17:08:50 EDT ---

Created attachment 614124 [details]
optional use of a thread pool

Additional possible change to use a thread pool to handle I/O redirect. Might be good if there are a lot of script executions, though I don't know if it is worth it for the average case.

--- Additional comment from mazz@redhat.com on 2012-09-18 17:20:41 EDT ---

git commit to master: bf4b25f421769909e886eb42f7b99c66b207b200
Comment 1 Charles Crouch 2012-10-30 16:39:30 EDT
Ready to be cherry picked to the 31x release branch
Comment 2 John Mazzitelli 2012-10-31 14:16:08 EDT
git cherry pick to release/jon3.1.x branch: 401c7a219876e5af9075ff09ee1bbb83cca6b884
Comment 3 John Mazzitelli 2012-11-12 13:41:10 EST
need to fix the api change
Comment 4 John Mazzitelli 2012-11-12 13:52:26 EST
cherry picked master git commit e4b77aefd05bd093816e46792a928c49d2f3b402 to deprecate old API: ad54ee06aabfa72fafaf1bb7936ac4d7175756e8
Comment 5 Simeon Pinder 2012-11-21 16:56:07 EST
Moving to ON_QA as available for test with build : https://brewweb.devel.redhat.com//buildinfo?buildID=244662.
Comment 6 Simeon Pinder 2013-09-13 10:30:02 EDT
Closing. In current release.

Note You need to log in before you can comment on or make changes to this bug.