Bug 894493

Summary: [as7] Start operation returns failure when start script returns exit code 0 (success)
Product: [JBoss] JBoss Operations Network Reporter: Larry O'Leary <loleary>
Component: Operations, Plugin -- JBoss EAP 6Assignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: urgent    
Version: JON 3.1.1CC: jkremser, jstefl, myarboro
Target Milestone: ER01   
Target Release: JON 3.2.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 961437 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 961437    

Description Larry O'Leary 2013-01-11 21:25:09 UTC
Description of problem:
After invoking the start operation on an AS7/EAP6 resource, the operation status is reported as failure even thought the resource is started and the script output indicates that the resource was started successfully and exited cleanly with return code 0.

The only time the start operation reports success is when the start script blocks (i.e. does not return).

Version-Release number of selected component (if applicable):
4.4.0.JON311GA

How reproducible:
Always

Steps to Reproduce:
1.  Install and configure EAP 6 standalone server
2.  Create custom standalone.sh start script wrapper which returns an exit code:

cat > "${JBOSS_HOME}/bin/standalone-wrapper.sh" << EOF
#!/bin/sh

DIRNAME=\$(dirname "\$0")

eval \"\${DIRNAME}/standalone.sh\" "\$@" \&
exit \$?
EOF
chmod +x "${JBOSS_HOME}/bin/standalone-wrapper.sh"

3.  Using the newly created start script wrapper, start the EAP 6 standalone server
4.  Start JBoss ON system
5.  Import EAP 6 standalone server into inventory
6.  Configure the EAP 6 resource's connection settings to use the custom start script wrapper standalone-wrapper.sh
7.  After EAP 6 resource shows availability of UP, invoke its shutdown operation
8.  Wait until EAP 6 resource is reported as DOWN
9.  Invoke the EAP 6 resource's start operation

Actual results:
EAP server is started but operation status in UI shows Failure with the following error message available from the UI:

    java.lang.Exception: Start failed with error code 0:

	    at org.rhq.core.pc.operation.OperationInvocation.run(OperationInvocation.java:278)
	    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	    at java.lang.Thread.run(Thread.java:636)

Expected results:
EAP server is started and the operation status in the UI shows Success.

Additional info:
This is a direct result of incorrectly handling the process exit code from .ProcessExecutionResults.getExitCode() in BaseServerComponent.startServer(). We treat <null> as success when really 0 is success and <null> simply means that the request has not yet returned or is blocking or timed out.

To fix this:

diff --git a/modules/plugins/jboss-as-7/src/main/java/org/rhq/modules/plugins/jbossas7/BaseServerComponent.java b/modules/plugins/jboss-as-7/src/main/java/org/rhq/modules/plugins/jbossas7/Ba
index 98a46d2..3dc12de 100644
--- a/modules/plugins/jboss-as-7/src/main/java/org/rhq/modules/plugins/jbossas7/BaseServerComponent.java
+++ b/modules/plugins/jboss-as-7/src/main/java/org/rhq/modules/plugins/jbossas7/BaseServerComponent.java
@@ -330,7 +330,7 @@ public abstract class BaseServerComponent<T extends ResourceComponent<?>> extend
         logExecutionResults(results);
         if (results.getError() != null) {
             operationResult.setErrorMessage(results.getError().getMessage());
-        } else if (results.getExitCode() != null) {
+        } else if (results.getExitCode() != null && results.getExitCode() != 0) {
             operationResult.setErrorMessage("Start failed with error code " + results.getExitCode() + ":\n" + results.getCapturedOutput());
         } else {
             // Try to connect to the server - ping once per second, timing out after 20s.

Comment 1 Jirka Kremser 2013-02-04 10:52:10 UTC
I did something similar for as6 (feb01f468)

Comment 2 Jirka Kremser 2013-03-06 11:41:08 UTC
master
http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=c0fe931b7

time:    Wed Mar 6 12:24:27 2013 +0100
commit:  c0fe931b74ac8eccfc5be8af2275bea5d8c0c917
author:  Jirka Kremser - jkremser
message: [BZ 894493] - [as7] Start operation returns failure when start script returns exit code 0 (success). Added the test for the exit code.

Comment 3 Larry O'Leary 2013-09-06 14:31:30 UTC
As this is MODIFIED or ON_QA, setting milestone to ER1.

Comment 4 Jan Stefl 2013-10-11 14:39:12 UTC
Verified with JON 3.2.ER3 + EAP 6.1.1 - PASSED