Bug 1044986

Summary: rhqctl script hangs when starting the agent in DEBUG mode
Product: [JBoss] JBoss Operations Network Reporter: bkramer <bkramer>
Component: Launch Scripts, AgentAssignee: John Mazzitelli <mazz>
Status: CLOSED CURRENTRELEASE QA Contact: Jan Bednarik <jbednari>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: hrupp, jbednari, loleary, mazz, mfoley, miburman, mkoci, myarboro, spinder
Target Milestone: CR01Keywords: Triaged, Upstream
Target Release: JON 3.2.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-05 15:40:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1044989    
Bug Blocks:    

Description bkramer 2013-12-19 11:37:36 UTC
Description of problem:

Starting JBoss ON components using rhqctl script when JBoss ON Agent is in DEBUG mode will properly start all components (storage, server and agent) but it will not return the prompt.

Version-Release number of selected component (if applicable):
JBoss ON 3.2 GA

How reproducible:
Always

Steps to Reproduce:
1. Set the agent into DEBUG mode by editing $RHQ-AGENT/bin/rhq-agent-env.sh file and uncommenting out the line: #RHQ_AGENT_DEBUG=true
2. Save the change;
3. Start all three components by running: ./rhqctl start

Actual results:
All three components are properly started but the script is hanging and is not returning the prompt.

Expected results:
All three components are properly started and the script properly exits;

Additional info:

Adding "set -x" to the rhqctl and rhq-agent-wrapper.sh scripts showed that all commands in the script were executed, including the "exit 0" for a successful agent startup. It seems that the actual rhqctl script hangs in:

"$RHQ_SERVER_JAVA_EXE_FILE_PATH" ${RHQ_CONTROL_JAVA_OPTS} ${RHQ_CONTROL_ADDITIONAL_JAVA_OPTS} -jar "${RHQ_SERVER_JBOSS_HOME}/jboss-modules.jar" -mp "$_JBOSS_MODULEPATH" org.rhq.rhq-server-control "$@"


When running "kill -3" the threadump showed:


Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.1-b02 mixed mode):

"Thread-7" daemon prio=10 tid=0x00007fb1a852e800 nid=0xfd3 runnable [0x00007fb1ad3c5000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:177)
    at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105)
    at java.lang.Thread.run(Thread.java:619)

"Thread-6" daemon prio=10 tid=0x00007fb1a852e000 nid=0xfc8 runnable [0x00007fb1ad4c6000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:199)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
    - locked <0x00007fb19a6e56c8> (a java.io.BufferedInputStream)
    at java.io.FilterInputStream.read(FilterInputStream.java:90)
    at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105)
    at java.lang.Thread.run(Thread.java:619)

"Reference Reaper" daemon prio=10 tid=0x00007fb1a81eb000 nid=0xe17 in Object.wait() [0x00007fb1ad840000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007fb19ab70958> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
    - locked <0x00007fb19ab70958> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
    at org.jboss.modules.ref.References$ReaperThread.run(References.java:68)

"Low Memory Detector" daemon prio=10 tid=0x00007fb1a8157800 nid=0xe15 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x00007fb1a8154800 nid=0xe14 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x00007fb1a8152800 nid=0xe13 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007fb1a8150800 nid=0xe12 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007fb1a8133000 nid=0xe11 in Object.wait() [0x00007fb1ade4c000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007fb185086ae8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
    - locked <0x00007fb185086ae8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00007fb1a8131000 nid=0xe10 in Object.wait() [0x00007fb1adf4d000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007fb185083c38> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0x00007fb185083c38> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007fb1a8007800 nid=0xe0a in Object.wait() [0x00007fb1aef93000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007fb19a615530> (a java.lang.Thread)
    at java.lang.Thread.join(Thread.java:1143)
    - locked <0x00007fb19a615530> (a java.lang.Thread)
    at java.lang.Thread.join(Thread.java:1196)
    at org.apache.commons.exec.PumpStreamHandler.stop(PumpStreamHandler.java:171)
    at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:363)
    at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160)
    at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147)
    at org.rhq.server.control.command.Start.startAgent(Start.java:192)
    at org.rhq.server.control.command.Start.exec(Start.java:86)
    at org.rhq.server.control.ControlCommand.exec(ControlCommand.java:126)
    at org.rhq.server.control.RHQControl.exec(RHQControl.java:83)
    at org.rhq.server.control.RHQControl.main(RHQControl.java:242)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.jboss.modules.Module.run(Module.java:270)
    at org.jboss.modules.Main.main(Main.java:411)


That also means that we were in the startAgent method ( at org.rhq.server.control.command.Start.startAgent(Start.java:192)) trying to execute line 192:

   private void startAgent() throws Exception {
        log.debug("Starting RHQ agent");

        File agentBinDir = new File(getAgentBasedir(), "bin");
        Executor executor = new DefaultExecutor();
        executor.setWorkingDirectory(agentBinDir);
        executor.setStreamHandler(new PumpStreamHandler());
        org.apache.commons.exec.CommandLine commandLine = getCommandLine("rhq-agent-wrapper", "start");

        if (isWindows()) {
            try {
                executor.execute(commandLine);
            } catch (Exception e) {
                // Ignore, service may not exist or be running, , script returns 1
                log.debug("Failed to start agent service", e);
            }
        } else {
            executor.execute(commandLine);  < line 192
        }
    }


However, the StreamPump (at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105)) is waiting on the line 105 for some input:

104     try {
105     while ((length = is.read(buf)) > 0) {
106     os.write(buf, 0, length);
107     }

So, when debug is enabled, somehow the input from the rhq-agent-wrapper is still open although we can see the script getting to "exit 0".

Comment 1 John Mazzitelli 2014-02-27 19:58:42 UTC
This is due to the fact that when debug is enabled, we do not redirect stdout/stderr to the log file. This causes stdout of the start command to be lost and not get slurped in by rhqctl (I think this is related to the fact that we can't pass stdout/stderr/stdin file descriptors down to child processes).

If I change the "eval" below (in the if clause) so it redirects stdout/stderr to /dev/null, it no longer hangs. It looks like we don't redirect to the log file on purpose, but I'm not sure why. It would seem the logical thing to do would be to redirect the start command always to the log file, but again, not sure if there was a reason we did it like this before. I'll have to check the history to see what else I can find:

rhq-agent-wrapper.sh code:

        # start the agent now!
        if [ -n "$RHQ_AGENT_DEBUG" ] && [ "$RHQ_AGENT_DEBUG" != "false" ]; then
           debug_wrapper_msg "Executing agent with command: ${RHQ_AGENT_START_COMMAND}"
           eval "$RHQ_AGENT_START_COMMAND > /dev/null 2>&1" #<----this redirection fixes the problem
        else
           # since the start command may run as non-root user, ensure the old
           # wrapper (which may be owned by root) is removed
           wrapper="rhq-agent-wrapper.log"
           rm -f $RHQ_AGENT_HOME/logs/$wrapper
           eval "$RHQ_AGENT_START_COMMAND > \"${RHQ_AGENT_HOME}/logs/${wrapper}\" 2>&1"
        fi

Comment 2 John Mazzitelli 2014-02-27 22:33:53 UTC
git commit to master b3681f2

Comment 3 Heiko W. Rupp 2014-02-28 11:40:52 UTC
resetting state, as this was fixed upstream, but not yet added to the branch.

Comment 5 John Mazzitelli 2014-08-08 14:32:44 UTC
cherry picked to release/jon3.2.x branch:

commit 3ed1b1001fa18b28add9459b02896a29675daaed
Author: John Mazzitelli <mazz>
Date:   Thu Feb 27 16:38:00 2014 -0500

    BZ 1044986 - fix rhqctl script so it doesn't hang when starting the agent in DEBUG mode
    (cherry picked from commit b3681f22dd3d7d893f57af56a9be6602111e3d7b)

Comment 6 Simeon Pinder 2014-08-15 03:19:14 UTC
Moving to ON_QA as this is available for test in JON 3.2.3 ER01 build:

http://jon01.mw.lab.eng.bos.redhat.com:8042/dist/release/jon/3.2.3.GA/8-14-14/

Comment 7 Jan Bednarik 2014-08-18 13:18:20 UTC
Moving to ASSIGNED

If the DEBUG mode of agent is enabled, the agent fails to start. I think the problem can be found in rhq-agent-wrapper.sh file on line 242:

eval "$RHQ_AGENT_START_COMMAND > \"${RHQ_AGENT_HOME}/logs/${wrapper}\" 2>&1"

Variable $wrapper is undefined and the path is evaluated as "${RHQ_AGENT_HOME}/logs/" so the script fails when it tries to redirect stdout to a directory.

Version: 3.2.0.GA Update 03
Build Number: bca1bc8:e19c43d

Comment 9 Michael Burman 2014-08-21 14:31:44 UTC
Cherry-picked a fix to release/jon3.2.x

commit 97c4f6e937d67b3bb133dfda21e2c41d88325e1a
Author: John Mazzitelli <mazz>
Date:   Fri Apr 11 10:38:23 2014 -0400

    [BZ 1044986] Remove ${wrapper} which does not match anything, to fix debug part
    
    (cherry picked from commit b9ce78527e71a7b558b6a1f3ed71fd7d8f9712d8)
    
    Conflicts:
        modules/enterprise/agent/src/etc/rhq-agent-wrapper.sh

Comment 10 Simeon Pinder 2014-08-26 23:42:17 UTC
Moving to ON_QA as ready for testing with the latest brew build:
http://jon01.mw.lab.eng.bos.redhat.com:8042/dist/release/jon/3.2.3.GA/8-26-14/

Comment 11 Jan Bednarik 2014-08-27 12:27:19 UTC
Moving to VERIFIED.

Checked that variable ${wrapper} was removed from rhq-agent-wrapper.sh and tried to start all components using rhqctl start command -> it works and the script returns the prompt as expected.

Version:3.2.0.GA Update 03
Build Number:9766504:97c4f6e