584008 – EWS: Restart op appears to do nothing, but says success.

Bug 584008 - EWS: Restart op appears to do nothing, but says success.

Summary: EWS: Restart op appears to do nothing, but says success.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Plugins
Sub Component:
Version:	1.4
Hardware:	All
OS:	Other
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	John Sanda
QA Contact:	Corey Welton
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	jon24-ews
TreeView+	depends on / blocked

Reported:	2010-04-20 13:50 UTC by Corey Welton
Modified:	2010-06-02 13:38 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:	Solaris / EWS 1.0.1
Last Closed:	2010-06-02 13:38:29 UTC
Embargoed:

Attachments	(Terms of Use)

Description Corey Welton 2010-04-20 13:50:16 UTC

Description of problem:
Attempting a restart op for EWS eventually returns a success message, even though nothing appears to have actually occurred

Version-Release number of selected component (if applicable):


How reproducible:
Every time

Steps to Reproduce:
1.  Install EWS and assure it has gone green in JON
2.  ps -ef|grep tomcat; note the process id.
3.  Operations > Restart; submit
4.  Wait some time (probably five+ minutes) for the operation to apparently complete successfully
5. ps -ef|grep tomcat and tail your tomcat logfile.
  
Actual results:

* Note that the task id for tomcat remains the same
* In catalina.out we see


Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 9003; nested exception is: 
        java.net.BindException: Address already in use

...where the port # (9003 in this case) is the rmi port you have configured.

Expected results:
* Restart op works
* A failed op should not return a success message.


Additional info:

Comment 1 John Sanda 2010-05-21 19:58:17 UTC

I performed several tests with tomcat6 using EWS 1.0.1 running on Fedora 12. I verified that the restart operation did in fact start a new tomcat process. I was also able to generate an exception in catalina.out that is similar to the one in the description. I created the port conflict by first starting an instance of EAP 5.0 then restarting tomcat via the restart resource operation. The operation reported having completed successfully. I looked further down in catalina.out and down past the exception it said that tomcat had started up successfully despite the port conflict.

I have spent some time reviewing the plugin code and the restart operation as you might expect is implemented as stop followed by a start. If the process execution associated with the stop operation reports any errors, an exception is thrown that eventually gets propagated up the call stack and back to the server in the form of a PluginContainerException. In the event of an exception, the result of the restart operation should be reported as a failure. For the start operation, if the associated process execution reports any errors, they are logged on the agent. As a final step we check the resource availability. If the availability is down an exception is thrown which is propagated up the call stack.

Corey can you provide with additional info including,

* EWS version
* which OS
* Other apps that you had running
* agent log
* catalina.out
* rhq server log

Thanks

Comment 2 Corey Welton 2010-05-27 12:41:33 UTC

The platform/server combination was Solaris / EWS 1.0.1.  I am unsure what other apps might have been running on the system at this point.  I will try to repro and see if this still occurs.

Comment 3 Charles Crouch 2010-05-27 13:24:50 UTC

This is with Corey to moving to ON_QA

Comment 4 Corey Welton 2010-06-01 14:01:13 UTC

Confirmed this is still occurring on Solaris.

* Got tomcat process id before sending restart op:

    root 20822     1  0   Apr 20 ?        8:00 /opt/java/jre1.6.0_14//bin/java -Dcom.sun.management.jmxremote.port=9003 -Dcom.

* sent restart op and waited
* Eventually op says it successfully completed

* Got tomcat process id 

    root 20822     1  0   Apr 20 ?        8:00 /opt/java/jre1.6.0_14//bin/java -Dcom.sun.management.jmxremote.port=9003 -Dcom.

Comment 5 John Sanda 2010-06-01 21:06:17 UTC

It looks like there was a port conflict for the com.sun.management.jmxremote.port property defined in /opt/redhat/ews/etc/sysconfig/tomcat6. I change that port and updating the connection properties in the RHQ server as well. When I tried a restart operation, I saw different pids.

Comment 6 Corey Welton 2010-06-02 13:38:29 UTC

QA Closing.  I am not sure whether we should be seeing a "success" message for a failed op, but this may percolating too far down in the system for it to really know that it has failed.

Note You need to log in before you can comment on or make changes to this bug.