Description of problem: Attempting a restart op for EWS eventually returns a success message, even though nothing appears to have actually occurred Version-Release number of selected component (if applicable): How reproducible: Every time Steps to Reproduce: 1. Install EWS and assure it has gone green in JON 2. ps -ef|grep tomcat; note the process id. 3. Operations > Restart; submit 4. Wait some time (probably five+ minutes) for the operation to apparently complete successfully 5. ps -ef|grep tomcat and tail your tomcat logfile. Actual results: * Note that the task id for tomcat remains the same * In catalina.out we see Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 9003; nested exception is: java.net.BindException: Address already in use ...where the port # (9003 in this case) is the rmi port you have configured. Expected results: * Restart op works * A failed op should not return a success message. Additional info:
I performed several tests with tomcat6 using EWS 1.0.1 running on Fedora 12. I verified that the restart operation did in fact start a new tomcat process. I was also able to generate an exception in catalina.out that is similar to the one in the description. I created the port conflict by first starting an instance of EAP 5.0 then restarting tomcat via the restart resource operation. The operation reported having completed successfully. I looked further down in catalina.out and down past the exception it said that tomcat had started up successfully despite the port conflict. I have spent some time reviewing the plugin code and the restart operation as you might expect is implemented as stop followed by a start. If the process execution associated with the stop operation reports any errors, an exception is thrown that eventually gets propagated up the call stack and back to the server in the form of a PluginContainerException. In the event of an exception, the result of the restart operation should be reported as a failure. For the start operation, if the associated process execution reports any errors, they are logged on the agent. As a final step we check the resource availability. If the availability is down an exception is thrown which is propagated up the call stack. Corey can you provide with additional info including, * EWS version * which OS * Other apps that you had running * agent log * catalina.out * rhq server log Thanks
The platform/server combination was Solaris / EWS 1.0.1. I am unsure what other apps might have been running on the system at this point. I will try to repro and see if this still occurs.
This is with Corey to moving to ON_QA
Confirmed this is still occurring on Solaris. * Got tomcat process id before sending restart op: root 20822 1 0 Apr 20 ? 8:00 /opt/java/jre1.6.0_14//bin/java -Dcom.sun.management.jmxremote.port=9003 -Dcom. * sent restart op and waited * Eventually op says it successfully completed * Got tomcat process id root 20822 1 0 Apr 20 ? 8:00 /opt/java/jre1.6.0_14//bin/java -Dcom.sun.management.jmxremote.port=9003 -Dcom.
It looks like there was a port conflict for the com.sun.management.jmxremote.port property defined in /opt/redhat/ews/etc/sysconfig/tomcat6. I change that port and updating the connection properties in the RHQ server as well. When I tried a restart operation, I saw different pids.
QA Closing. I am not sure whether we should be seeing a "success" message for a failed op, but this may percolating too far down in the system for it to really know that it has failed.