Description of problem: Starting JBoss ON components using rhqctl script when JBoss ON Agent is in DEBUG mode will properly start all components (storage, server and agent) but it will not return the prompt. Version-Release number of selected component (if applicable): JBoss ON 3.2 GA How reproducible: Always Steps to Reproduce: 1. Set the agent into DEBUG mode by editing $RHQ-AGENT/bin/rhq-agent-env.sh file and uncommenting out the line: #RHQ_AGENT_DEBUG=true 2. Save the change; 3. Start all three components by running: ./rhqctl start Actual results: All three components are properly started but the script is hanging and is not returning the prompt. Expected results: All three components are properly started and the script properly exits; Additional info: Adding "set -x" to the rhqctl and rhq-agent-wrapper.sh scripts showed that all commands in the script were executed, including the "exit 0" for a successful agent startup. It seems that the actual rhqctl script hangs in: "$RHQ_SERVER_JAVA_EXE_FILE_PATH" ${RHQ_CONTROL_JAVA_OPTS} ${RHQ_CONTROL_ADDITIONAL_JAVA_OPTS} -jar "${RHQ_SERVER_JBOSS_HOME}/jboss-modules.jar" -mp "$_JBOSS_MODULEPATH" org.rhq.rhq-server-control "$@" When running "kill -3" the threadump showed: Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.1-b02 mixed mode): "Thread-7" daemon prio=10 tid=0x00007fb1a852e800 nid=0xfd3 runnable [0x00007fb1ad3c5000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:177) at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105) at java.lang.Thread.run(Thread.java:619) "Thread-6" daemon prio=10 tid=0x00007fb1a852e000 nid=0xfc8 runnable [0x00007fb1ad4c6000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:199) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) - locked <0x00007fb19a6e56c8> (a java.io.BufferedInputStream) at java.io.FilterInputStream.read(FilterInputStream.java:90) at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105) at java.lang.Thread.run(Thread.java:619) "Reference Reaper" daemon prio=10 tid=0x00007fb1a81eb000 nid=0xe17 in Object.wait() [0x00007fb1ad840000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007fb19ab70958> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00007fb19ab70958> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at org.jboss.modules.ref.References$ReaperThread.run(References.java:68) "Low Memory Detector" daemon prio=10 tid=0x00007fb1a8157800 nid=0xe15 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x00007fb1a8154800 nid=0xe14 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x00007fb1a8152800 nid=0xe13 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00007fb1a8150800 nid=0xe12 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x00007fb1a8133000 nid=0xe11 in Object.wait() [0x00007fb1ade4c000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007fb185086ae8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00007fb185086ae8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x00007fb1a8131000 nid=0xe10 in Object.wait() [0x00007fb1adf4d000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007fb185083c38> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00007fb185083c38> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x00007fb1a8007800 nid=0xe0a in Object.wait() [0x00007fb1aef93000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007fb19a615530> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1143) - locked <0x00007fb19a615530> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1196) at org.apache.commons.exec.PumpStreamHandler.stop(PumpStreamHandler.java:171) at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:363) at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) at org.rhq.server.control.command.Start.startAgent(Start.java:192) at org.rhq.server.control.command.Start.exec(Start.java:86) at org.rhq.server.control.ControlCommand.exec(ControlCommand.java:126) at org.rhq.server.control.RHQControl.exec(RHQControl.java:83) at org.rhq.server.control.RHQControl.main(RHQControl.java:242) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jboss.modules.Module.run(Module.java:270) at org.jboss.modules.Main.main(Main.java:411) That also means that we were in the startAgent method ( at org.rhq.server.control.command.Start.startAgent(Start.java:192)) trying to execute line 192: private void startAgent() throws Exception { log.debug("Starting RHQ agent"); File agentBinDir = new File(getAgentBasedir(), "bin"); Executor executor = new DefaultExecutor(); executor.setWorkingDirectory(agentBinDir); executor.setStreamHandler(new PumpStreamHandler()); org.apache.commons.exec.CommandLine commandLine = getCommandLine("rhq-agent-wrapper", "start"); if (isWindows()) { try { executor.execute(commandLine); } catch (Exception e) { // Ignore, service may not exist or be running, , script returns 1 log.debug("Failed to start agent service", e); } } else { executor.execute(commandLine); < line 192 } } However, the StreamPump (at org.apache.commons.exec.StreamPumper.run(StreamPumper.java:105)) is waiting on the line 105 for some input: 104 try { 105 while ((length = is.read(buf)) > 0) { 106 os.write(buf, 0, length); 107 } So, when debug is enabled, somehow the input from the rhq-agent-wrapper is still open although we can see the script getting to "exit 0".
This is due to the fact that when debug is enabled, we do not redirect stdout/stderr to the log file. This causes stdout of the start command to be lost and not get slurped in by rhqctl (I think this is related to the fact that we can't pass stdout/stderr/stdin file descriptors down to child processes). If I change the "eval" below (in the if clause) so it redirects stdout/stderr to /dev/null, it no longer hangs. It looks like we don't redirect to the log file on purpose, but I'm not sure why. It would seem the logical thing to do would be to redirect the start command always to the log file, but again, not sure if there was a reason we did it like this before. I'll have to check the history to see what else I can find: rhq-agent-wrapper.sh code: # start the agent now! if [ -n "$RHQ_AGENT_DEBUG" ] && [ "$RHQ_AGENT_DEBUG" != "false" ]; then debug_wrapper_msg "Executing agent with command: ${RHQ_AGENT_START_COMMAND}" eval "$RHQ_AGENT_START_COMMAND > /dev/null 2>&1" #<----this redirection fixes the problem else # since the start command may run as non-root user, ensure the old # wrapper (which may be owned by root) is removed wrapper="rhq-agent-wrapper.log" rm -f $RHQ_AGENT_HOME/logs/$wrapper eval "$RHQ_AGENT_START_COMMAND > \"${RHQ_AGENT_HOME}/logs/${wrapper}\" 2>&1" fi
git commit to master b3681f2
resetting state, as this was fixed upstream, but not yet added to the branch.
cherry picked to release/jon3.2.x branch: commit 3ed1b1001fa18b28add9459b02896a29675daaed Author: John Mazzitelli <mazz> Date: Thu Feb 27 16:38:00 2014 -0500 BZ 1044986 - fix rhqctl script so it doesn't hang when starting the agent in DEBUG mode (cherry picked from commit b3681f22dd3d7d893f57af56a9be6602111e3d7b)
Moving to ON_QA as this is available for test in JON 3.2.3 ER01 build: http://jon01.mw.lab.eng.bos.redhat.com:8042/dist/release/jon/3.2.3.GA/8-14-14/
Moving to ASSIGNED If the DEBUG mode of agent is enabled, the agent fails to start. I think the problem can be found in rhq-agent-wrapper.sh file on line 242: eval "$RHQ_AGENT_START_COMMAND > \"${RHQ_AGENT_HOME}/logs/${wrapper}\" 2>&1" Variable $wrapper is undefined and the path is evaluated as "${RHQ_AGENT_HOME}/logs/" so the script fails when it tries to redirect stdout to a directory. Version: 3.2.0.GA Update 03 Build Number: bca1bc8:e19c43d
Cherry-picked a fix to release/jon3.2.x commit 97c4f6e937d67b3bb133dfda21e2c41d88325e1a Author: John Mazzitelli <mazz> Date: Fri Apr 11 10:38:23 2014 -0400 [BZ 1044986] Remove ${wrapper} which does not match anything, to fix debug part (cherry picked from commit b9ce78527e71a7b558b6a1f3ed71fd7d8f9712d8) Conflicts: modules/enterprise/agent/src/etc/rhq-agent-wrapper.sh
Moving to ON_QA as ready for testing with the latest brew build: http://jon01.mw.lab.eng.bos.redhat.com:8042/dist/release/jon/3.2.3.GA/8-26-14/
Moving to VERIFIED. Checked that variable ${wrapper} was removed from rhq-agent-wrapper.sh and tried to start all components using rhqctl start command -> it works and the script returns the prompt as expected. Version:3.2.0.GA Update 03 Build Number:9766504:97c4f6e