608148 – Stop operations on a Tomcat Connector fails after successful Restart operation of the Tomcat Server

Bug 608148 - Stop operations on a Tomcat Connector fails after successful Restart operation of the Tomcat Server

Summary: Stop operations on a Tomcat Connector fails after successful Restart operatio...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Operations
Sub Component:
Version:	3.0.0
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jay Shaughnessy
QA Contact:	Corey Welton
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	jon-sprint11-bugs
TreeView+	depends on / blocked

Reported:	2010-06-25 19:53 UTC by John Sefler
Modified:	2010-08-12 16:48 UTC (History)
CC List:	1 user (show)
Fixed In Version:	2.4
Clone Of:
Environment:
Last Closed:	2010-08-12 16:48:27 UTC
Embargoed:

Attachments	(Terms of Use)

Description John Sefler 2010-06-25 19:53:00 UTC

Description of problem:
After successful inventory of a Tomcat Server, you can successfully Stop, Start, Stop, Start, Stop, Start, and so on... its Tomcat Connectors (e.g. http-8080).  However, if you then successfully run a Restart operation on the Tomcat Server (8080) itself, then try to run a new Stop or Start operation on the same Tomcat Connector (e.g. http-8080).  This time it will fail with a stack track.  Below is the stack trace.

The only way to get the operations on the Tomcat Connectors to work again is to Uninventory, re-discover and Import the Tomcat Server again. 


Version-Release number of selected component (if applicable):
 RHQ
version: 3.0.0-SNAPSHOT
build number: b9ca90d 

 JBoss Operations Network
version: 2.4.0.GA_QA
build number: 10745:647a602 


Note that this failure is easier to reproduce on a Tomcat5 server rather than a Tomcat6 server because on a Tomcat6, the Stop operation on the Tomcat Connector appears successful, yet the port it was connected to actually fails to be released as indicated by netstat.   But this is actually a separate defect and should be opened against Apache.
# netstat -lpn | grep 8080
Proto Recv-Q Send-Q Local Address   Foreign Address  State   PID/Program name 
tcp        0      0 :::8080         :::*             LISTEN  -

Note the PID is - and therefore the port remains bound to an unknown process.




Actual RHQ/JON results:
org.mc4j.ems.connection.EmsInvocationException: Exception on invocation of [stop]java.lang.reflect.UndeclaredThrowableException
	at org.mc4j.ems.impl.jmx.connection.bean.operation.DOperation.invoke(DOperation.java:127)
	at org.rhq.plugins.jmx.MBeanResourceComponent.invokeOperation(MBeanResourceComponent.java:541)
	at org.rhq.plugins.jmx.MBeanResourceComponent.invokeOperation(MBeanResourceComponent.java:511)
	at sun.reflect.GeneratedMethodAccessor228.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocationThread.call(ResourceContainer.java:525)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.reflect.UndeclaredThrowableException
	at $Proxy60.invoke(Unknown Source)
	at org.mc4j.ems.impl.jmx.connection.bean.operation.DOperation.invoke(DOperation.java:111)
	... 11 more
Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: 
	java.net.ConnectException: Connection refused
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
	at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
	at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:128)
	at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
	at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source)
	at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1001)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at org.mc4j.ems.impl.jmx.connection.support.providers.proxy.JMXRemotingMBeanServerProxy.invoke(JMXRemotingMBeanServerProxy.java:61)
	... 13 more
Caused by: java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
	at java.net.Socket.connect(Socket.java:546)
	at java.net.Socket.connect(Socket.java:495)
	at java.net.Socket.<init>(Socket.java:392)
	at java.net.Socket.<init>(Socket.java:206)
	at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
	at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:146)
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
	... 24 more



Expected results:


Additional info:

Comment 1 Jay Shaughnessy 2010-07-01 16:12:24 UTC

So this is an interesting bug that touches on an area that's been a problem for me in the past, and I've worked around it. This time I think the fix is in place for the root cause.

The issue is in the way the JMX plugin caches the component mbean. The caching is fine most of the time but the bean's validity is only verified in the MBeanResourceComponent impl of getAvailability(). This is problematic in that there can be a significant window of time (minutes) between the bean becoming invalid and a call to getAvailability(). And if, in the case of this issue report, getAvailability() is overriden without calling the super, the bean will never get refreshed short of an agent shutdown.

When the server restart operation happens the TC server is shutdown and restarted. The mbean connections are all lost at that point and the cached beans become invalid. They stay that way at least until the next availability check. Note that that check is scheduled by the plugin container, it is unrelated to the fact that the server has been restarted via the operation.

So, metric collection, operations, etc are all going to be in trouble until the avail check. And TC connectors, due to the override, will not perform correctly after the restart.

The solution is to change the implementation of MBeanResourceComponent.getEmsBean(). This method typically returns the cached bean. I'm adding a (fast) check to ensure that the cached bean's connection matches the current emsConnection. If not the bean is reset.

This has possible benefit to all JMX based plugins. I am sure there must be other code paths, especially for plugins offering stop/start/restart capability, where this could have been a problem.

note - reviewed with mazz.

Comment 2 Jay Shaughnessy 2010-07-01 16:13:00 UTC

fix commit: c6a959a6fd636f15c76493bf20a9ad779e441175

Comment 3 Jay Shaughnessy 2010-07-01 19:32:13 UTC

In addition to verifying the scenario written up in this BZ I would recommend that QA also attempt a similar test with an AS4 restart.  After the restart try an operation on some child service and not the server itself. Analogous to the connector operation used for TC.

Comment 4 Corey Welton 2010-07-06 20:23:06 UTC

Tested against Tomcat5 and seem to be fine.  Will test against AS4.

Comment 5 Corey Welton 2010-07-06 20:30:57 UTC

QA Verified against Tomcat5 and EAP4.3  After performing a restart op on the server and then stop/start against some child resource, things look fine.

Comment 6 Corey Welton 2010-08-12 16:48:27 UTC

Mass-closure of verified bugs against JON.

Note You need to log in before you can comment on or make changes to this bug.