Bug 1017961 - JMX Plugin can fail testing for registration
Summary: JMX Plugin can fail testing for registration
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Plugins
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: RHQ 4.10
Assignee: Jean-frederic Clere
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1026423
TreeView+ depends on / blocked
 
Reported: 2013-10-10 20:48 UTC by Elias Ross
Modified: 2014-04-23 12:30 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
: 1026423 (view as bug list)
Environment:
Last Closed: 2014-04-23 12:30:34 UTC
Embargoed:


Attachments (Terms of Use)
Example of mixed states (Storage Node) (142.07 KB, image/png)
2013-11-03 23:56 UTC, Elias Ross
no flags Details
Fixes the issue (6.28 KB, patch)
2013-11-04 22:58 UTC, Elias Ross
no flags Details | Diff
Create a new EmsConnection again in TomcatServerComponent if previous one was stale (1.13 KB, patch)
2013-11-15 17:01 UTC, Thomas Segismont
no flags Details | Diff

Description Elias Ross 2013-10-10 20:48:56 UTC
Description of problem:

After restarting a monitored JVM, Apache Flume in this case, the availability of some (not all) MBeans appears down. Availability checks do not fix themselves and this error persists. Restarting the agent fixes the issue.

Also, oddly enough, in this state metrics do appear.

2013-10-10 20:03:12,686 DEBUG [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Failed to collect availability on Resource[id=167303, uuid=c8c5347f-1b98-4cf9-8099-419eca2f5414, type={JMX}Threading, key=java.
lang:type=Threading, name=Threading, parent=JMX Server (8013)]
java.lang.RuntimeException: Availability check failed
        at org.rhq.core.pc.inventory.AvailabilityProxy.getAvailability(AvailabilityProxy.java:123)
        at org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(AvailabilityExecutor.java:268)
        at org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(AvailabilityExecutor.java:319)
        at org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(AvailabilityExecutor.java:319)
        at org.rhq.core.pc.inventory.AvailabilityExecutor.call(AvailabilityExecutor.java:131)
        at org.rhq.core.pc.inventory.AvailabilityExecutor.run(AvailabilityExecutor.java:80)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.reflect.UndeclaredThrowableException
        at $Proxy51.isRegistered(Unknown Source)
        at org.mc4j.ems.impl.jmx.connection.bean.DMBean.isRegistered(DMBean.java:188)
        at org.rhq.plugins.jmx.MBeanResourceComponent.isMBeanAvailable(MBeanResourceComponent.java:240)
        at org.rhq.plugins.jmx.MBeanResourceComponent.getAvailability(MBeanResourceComponent.java:227)
        at org.rhq.core.pc.inventory.AvailabilityProxy.call(AvailabilityProxy.java:72)
        at org.rhq.core.pc.inventory.AvailabilityProxy.call(AvailabilityProxy.java:22)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        ... 3 more
Caused by: java.rmi.ConnectException: Connection refused to host: 17.172.9.135; nested exception is: 
        java.net.ConnectException: Connection refused
        at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601)
        at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198)
        at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
        at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110)
        at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl_Stub.isRegistered(Unknown Source)
        at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.isRegistered(RMIConnector.java:839)
        at sun.reflect.GeneratedMethodAccessor9555.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.mc4j.ems.impl.jmx.connection.support.providers.proxy.JMXRemotingMBeanServerProxy.invoke(JMXRemotingMBeanServerProxy.java:59)
        ... 11 more
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:525)
        at java.net.Socket.connect(Socket.java:475)
        at java.net.Socket.<init>(Socket.java:372)
        at java.net.Socket.<init>(Socket.java:186)
        at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
        at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
        at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595)
        ... 21 more


Version-Release number of selected component (if applicable):

4.5.1

How reproducible: Sometimes

Steps to Reproduce:
1. Monitor a plain JVM with a large number of MBeans
2. Restart the JVM
3. Observe the state of resources is DOWN


Analysis:

It appears the exception is phony, as the host (port is not logged but should be correct). Could be a JVM bug.

Comment 1 Elias Ross 2013-11-03 23:56:17 UTC
Created attachment 818835 [details]
Example of mixed states (Storage Node)

Comment 2 Elias Ross 2013-11-04 00:02:43 UTC
Patch is forthcoming, assuming it passes my testing. The fix is to simply request the Bean from from the MBeanServer every time you request availability and not try to cache it. 

There is another exception I've seen.

This happens for pretty much any JMX server.

Comment 3 Elias Ross 2013-11-04 22:58:51 UTC
Created attachment 819405 [details]
Fixes the issue

Patch, this seems to fix the issue.

Note that refreshing the EmsConnection when you can't find the MBean (and the MBean really does not exist) than simply doing the right thing and calling get for each call. I'm not a big fan of premature optimization when doing monitoring anyway.

Comment 4 Elias Ross 2013-11-04 23:01:07 UTC
Insert 'is worse' before 'than simply' . The forum talks about this approach:

https://community.jboss.org/thread/234131

Comment 5 Thomas Segismont 2013-11-15 16:52:56 UTC
I first though it was a problem with the Java version (see https://bugzilla.redhat.com/show_bug.cgi?id=1026423#c1). But it is not (see https://bugzilla.redhat.com/show_bug.cgi?id=1026423#c2).

Here are the steps to get the same stack trace:
1. Inventory a Tomcat instance
2. Change avail collection schedule of the Tomcat Server resource to something higher, like 3 minutes
3. Change avail collection of a Tomcat Connector resource to the minimum (30 seconds)
4. Restart the Tomcat server externally between two avail checks (watch the agent logs)

Results:
You will find the Tomcat Connector resource in state DOWN (and possibly some other child resources of the Tomcat Server, depending on the schedules).

On next Tomcat Server resource avail check, the Tomcat Server resource will go DOWN as well as all its child resources.

Another Tomcat Server resource avail check and all Tomcat resources will come back UP.

Why?
The problem is in the way the top level JMXComponent (the TomcatServerComponent in our case) implements #getEmsConnection.
TomcatServerComponent class caches the emsConnection and refreshes it only when:
1. it checks avail and previous avail was DOWN
2. it collects its unique metric ("Server Identifier" trait, every 24 hours by default)

Storage node plugin:
I was not able to reproduce the issue with a Storage Node resource. Actually, the StorageNodeComponent extends JMXServerComponent and inherits #getEmsConnection from it. JMXServerComponent#getEmsConnection always checks the connection state:

==
    public EmsConnection getEmsConnection() {
        this.reconnectIfDisconnected();

        return this.connection;
    }
==

Apache Flume plugin:
Elias, how did you implement the top level JMXComponent in your Apache Flume plugin?

Comment 6 Thomas Segismont 2013-11-15 17:01:01 UTC
Created attachment 824630 [details]
Create a new EmsConnection again in TomcatServerComponent if previous one was stale

Comment 7 Thomas Segismont 2013-11-15 17:40:14 UTC
Adding Jean-Frederic in CC list (patch involves Tomcat plugin).

Comment 8 Elias Ross 2013-11-16 01:37:41 UTC
Flume is basically JMXServerComponent plus MBeanResourceComponent.

I know you obsoleted my patch/screenshot, but might be worth a look.

Were you able to reproduce the java.rmi.ConnectException like I had?

Comment 9 Thomas Segismont 2013-11-18 09:27:29 UTC
(In reply to Elias Ross from comment #8)
> Flume is basically JMXServerComponent plus MBeanResourceComponent.
> 
> I know you obsoleted my patch/screenshot, but might be worth a look.

I did look at it and tried. But it did not solve the problem.

> 
> Were you able to reproduce the java.rmi.ConnectException like I had?

Yes, with the reproduction steps described in Comment #5

I just reproduced it to show you:

2013-11-18 10:23:03,738 WARN  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Availability collection failed with exception on Resource[id=10146, uuid=35a62d36-15f9-4c37-8217-a25b195ec616, type={Tomcat}VM Memory System, key=java.lang:type=Memory, name=Memory Subsystem, parent=Tomcat Server JVM], availability will be reported as DOWN
java.lang.reflect.UndeclaredThrowableException
	at com.sun.proxy.$Proxy81.isRegistered(Unknown Source)
	at org.mc4j.ems.impl.jmx.connection.bean.DMBean.isRegistered(DMBean.java:188)
	at org.rhq.plugins.jmx.MBeanResourceComponent.isMBeanAvailable(MBeanResourceComponent.java:242)
	at org.rhq.plugins.jmx.MBeanResourceComponent.getAvailability(MBeanResourceComponent.java:229)
	at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocation.call(ResourceContainer.java:654)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: 
	java.net.ConnectException: Connexion refusée
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
	at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
	at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:129)
	at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
	at javax.management.remote.rmi.RMIConnectionImpl_Stub.isRegistered(Unknown Source)
	at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.isRegistered(RMIConnector.java:874)
	at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.mc4j.ems.impl.jmx.connection.support.providers.proxy.JMXRemotingMBeanServerProxy.invoke(JMXRemotingMBeanServerProxy.java:59)
	... 12 more
Caused by: java.net.ConnectException: Connexion refusée
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:579)
	at java.net.Socket.connect(Socket.java:528)
	at java.net.Socket.<init>(Socket.java:425)
	at java.net.Socket.<init>(Socket.java:208)
	at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
	at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:147)
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
	... 22 more

Comment 10 Jean-frederic Clere 2013-11-28 12:38:07 UTC
merged in master by 49d276b51a5a1a849a38fc1f75250096d141c813

Comment 11 Thomas Segismont 2013-11-28 14:26:29 UTC
Reviewed and merged by Jean Frederic (see Comment 10)

Comment 12 Heiko W. Rupp 2014-04-23 12:30:34 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.


Note You need to log in before you can comment on or make changes to this bug.