Description of problem: After restarting a monitored JVM, Apache Flume in this case, the availability of some (not all) MBeans appears down. Availability checks do not fix themselves and this error persists. Restarting the agent fixes the issue. Also, oddly enough, in this state metrics do appear. 2013-10-10 20:03:12,686 DEBUG [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Failed to collect availability on Resource[id=167303, uuid=c8c5347f-1b98-4cf9-8099-419eca2f5414, type={JMX}Threading, key=java. lang:type=Threading, name=Threading, parent=JMX Server (8013)] java.lang.RuntimeException: Availability check failed at org.rhq.core.pc.inventory.AvailabilityProxy.getAvailability(AvailabilityProxy.java:123) at org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(AvailabilityExecutor.java:268) at org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(AvailabilityExecutor.java:319) at org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(AvailabilityExecutor.java:319) at org.rhq.core.pc.inventory.AvailabilityExecutor.call(AvailabilityExecutor.java:131) at org.rhq.core.pc.inventory.AvailabilityExecutor.run(AvailabilityExecutor.java:80) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.reflect.UndeclaredThrowableException at $Proxy51.isRegistered(Unknown Source) at org.mc4j.ems.impl.jmx.connection.bean.DMBean.isRegistered(DMBean.java:188) at org.rhq.plugins.jmx.MBeanResourceComponent.isMBeanAvailable(MBeanResourceComponent.java:240) at org.rhq.plugins.jmx.MBeanResourceComponent.getAvailability(MBeanResourceComponent.java:227) at org.rhq.core.pc.inventory.AvailabilityProxy.call(AvailabilityProxy.java:72) at org.rhq.core.pc.inventory.AvailabilityProxy.call(AvailabilityProxy.java:22) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... 3 more Caused by: java.rmi.ConnectException: Connection refused to host: 17.172.9.135; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.isRegistered(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.isRegistered(RMIConnector.java:839) at sun.reflect.GeneratedMethodAccessor9555.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mc4j.ems.impl.jmx.connection.support.providers.proxy.JMXRemotingMBeanServerProxy.invoke(JMXRemotingMBeanServerProxy.java:59) ... 11 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:525) at java.net.Socket.connect(Socket.java:475) at java.net.Socket.<init>(Socket.java:372) at java.net.Socket.<init>(Socket.java:186) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ... 21 more Version-Release number of selected component (if applicable): 4.5.1 How reproducible: Sometimes Steps to Reproduce: 1. Monitor a plain JVM with a large number of MBeans 2. Restart the JVM 3. Observe the state of resources is DOWN Analysis: It appears the exception is phony, as the host (port is not logged but should be correct). Could be a JVM bug.
Created attachment 818835 [details] Example of mixed states (Storage Node)
Patch is forthcoming, assuming it passes my testing. The fix is to simply request the Bean from from the MBeanServer every time you request availability and not try to cache it. There is another exception I've seen. This happens for pretty much any JMX server.
Created attachment 819405 [details] Fixes the issue Patch, this seems to fix the issue. Note that refreshing the EmsConnection when you can't find the MBean (and the MBean really does not exist) than simply doing the right thing and calling get for each call. I'm not a big fan of premature optimization when doing monitoring anyway.
Insert 'is worse' before 'than simply' . The forum talks about this approach: https://community.jboss.org/thread/234131
I first though it was a problem with the Java version (see https://bugzilla.redhat.com/show_bug.cgi?id=1026423#c1). But it is not (see https://bugzilla.redhat.com/show_bug.cgi?id=1026423#c2). Here are the steps to get the same stack trace: 1. Inventory a Tomcat instance 2. Change avail collection schedule of the Tomcat Server resource to something higher, like 3 minutes 3. Change avail collection of a Tomcat Connector resource to the minimum (30 seconds) 4. Restart the Tomcat server externally between two avail checks (watch the agent logs) Results: You will find the Tomcat Connector resource in state DOWN (and possibly some other child resources of the Tomcat Server, depending on the schedules). On next Tomcat Server resource avail check, the Tomcat Server resource will go DOWN as well as all its child resources. Another Tomcat Server resource avail check and all Tomcat resources will come back UP. Why? The problem is in the way the top level JMXComponent (the TomcatServerComponent in our case) implements #getEmsConnection. TomcatServerComponent class caches the emsConnection and refreshes it only when: 1. it checks avail and previous avail was DOWN 2. it collects its unique metric ("Server Identifier" trait, every 24 hours by default) Storage node plugin: I was not able to reproduce the issue with a Storage Node resource. Actually, the StorageNodeComponent extends JMXServerComponent and inherits #getEmsConnection from it. JMXServerComponent#getEmsConnection always checks the connection state: == public EmsConnection getEmsConnection() { this.reconnectIfDisconnected(); return this.connection; } == Apache Flume plugin: Elias, how did you implement the top level JMXComponent in your Apache Flume plugin?
Created attachment 824630 [details] Create a new EmsConnection again in TomcatServerComponent if previous one was stale
Adding Jean-Frederic in CC list (patch involves Tomcat plugin).
Flume is basically JMXServerComponent plus MBeanResourceComponent. I know you obsoleted my patch/screenshot, but might be worth a look. Were you able to reproduce the java.rmi.ConnectException like I had?
(In reply to Elias Ross from comment #8) > Flume is basically JMXServerComponent plus MBeanResourceComponent. > > I know you obsoleted my patch/screenshot, but might be worth a look. I did look at it and tried. But it did not solve the problem. > > Were you able to reproduce the java.rmi.ConnectException like I had? Yes, with the reproduction steps described in Comment #5 I just reproduced it to show you: 2013-11-18 10:23:03,738 WARN [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Availability collection failed with exception on Resource[id=10146, uuid=35a62d36-15f9-4c37-8217-a25b195ec616, type={Tomcat}VM Memory System, key=java.lang:type=Memory, name=Memory Subsystem, parent=Tomcat Server JVM], availability will be reported as DOWN java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy81.isRegistered(Unknown Source) at org.mc4j.ems.impl.jmx.connection.bean.DMBean.isRegistered(DMBean.java:188) at org.rhq.plugins.jmx.MBeanResourceComponent.isMBeanAvailable(MBeanResourceComponent.java:242) at org.rhq.plugins.jmx.MBeanResourceComponent.getAvailability(MBeanResourceComponent.java:229) at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocation.call(ResourceContainer.java:654) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: java.net.ConnectException: Connexion refusée at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:129) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.isRegistered(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.isRegistered(RMIConnector.java:874) at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.mc4j.ems.impl.jmx.connection.support.providers.proxy.JMXRemotingMBeanServerProxy.invoke(JMXRemotingMBeanServerProxy.java:59) ... 12 more Caused by: java.net.ConnectException: Connexion refusée at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at java.net.Socket.<init>(Socket.java:425) at java.net.Socket.<init>(Socket.java:208) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:147) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613) ... 22 more
merged in master by 49d276b51a5a1a849a38fc1f75250096d141c813
Reviewed and merged by Jean Frederic (see Comment 10)
Bulk closing of 4.10 issues. If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.