Bug 1003797 - Failed to start component for RHQ Server Subsystems
Failed to start component for RHQ Server Subsystems
Status: ON_QA
Product: RHQ Project
Classification: Other
Component: Plugins (Show other bugs)
x86_64 Linux
unspecified Severity low (vote)
: ---
: ---
Assigned To: RHQ Project Maintainer
Mike Foley
: 997601 (view as bug list)
Depends On:
Blocks: 1005708
  Show dependency treegraph
Reported: 2013-09-03 05:06 EDT by Ilya Maleev
Modified: 2013-10-07 17:20 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1005708 (view as bug list)
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
agent.log (3.81 MB, text/plain)
2013-09-03 05:06 EDT, Ilya Maleev
no flags Details

  None (edit)
Description Ilya Maleev 2013-09-03 05:06:43 EDT
Created attachment 793073 [details]

Sometimes resources under the RHQ Server Subsystem become unavailable:
Alert, Communications, Group Definition, Measurement, Remote API

In the agent.log the following exception keeps on coming

2013-08-29 09:49:27,311 ERROR [InventoryManager.discovery-1]
(rhq.core.pc.inventory.InventoryManager)- Failed to start component for Resource[id=11660,
uuid=4c3dedf1-8311-48dc-8484-3903b90dcef1, type={RHQServer}RHQ Server Remote API Subsystem,
key=rhq.remoting:type=RemoteApiMetrics, name=Remote API Subsystem, parent=RHQ Server Subsystems]
from synchronized merge.
org.rhq.core.clientapi.agent.PluginContainerException: Failed to start component for resource
Resource[id=11660, uuid=4c3dedf1-8311-48dc-8484-3903b90dcef1, type={RHQServer}RHQ Server Remote API
Subsystem, key=rhq.remoting:type=RemoteApiMetrics, name=Remote API Subsystem, parent=RHQ Server
at org.rhq.core.pc.inventory.InventoryManager.activateResource(InventoryManager.java:1831)
at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2815)
at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2821)
at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2821)
at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2821)
at org.rhq.core.pc.inventory.InventoryManager.synchInventory(InventoryManager.java:1145)
at org.rhq.core.pc.inventory.InventoryManager.synchInventory(InventoryManager.java:1115)
at org.rhq.core.pc.inventory.InventoryManager.handleReport(InventoryManager.java:1097)
at org.rhq.core.pc.inventory.RuntimeDiscoveryExecutor.call(RuntimeDiscoveryExecutor.java:64)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.rhq.core.pc.inventory.TimeoutException: [Warning] Call to
[org.rhq.plugins.server.RemoteAPIResourceComponent.start()] with args
[[org.rhq.core.pluginapi.inventory.ResourceContext@71903dd9]] timed out after 60000 milliseconds -
invocation thread will be interrupted.
... 18 more
Comment 1 Heiko W. Rupp 2013-09-03 06:15:56 EDT
Probably a dup / related to Bug 997601
Comment 2 John Mazzitelli 2013-10-07 14:31:16 EDT
I tried to fix this by re-building the connection on error. However, now I'm seeing this, as reported in BZ against JBoss EAP - bug #900595
Comment 3 John Mazzitelli 2013-10-07 14:36:53 EDT
In MBeanResourceComponent in the JMX plugin:

    private boolean isMBeanAvailable() {
        EmsBean emsBean = getEmsBean();
        boolean isAvailable = emsBean.isRegistered();

that isRegistered() call is what throws the exception.

We need to get the connection to re-establish itself. However, there appear to be bugs in JBossAS/EAP in this area that doesn't make that trivial.

I tried this in JBossAS7JMXComponent, but I then hit bug #900595 :

     public AvailabilityType getAvailability() {
-        // TODO (jshaughn): Figure out why this hangs, it seems so innocuous :)
-        //EmsConnection conn = getEmsConnection();
-        //ConnectionProvider connectionProvider = (null != conn) ? conn.getConnectionProvider() : null;
-        //return (null != connectionProvider && connectionProvider.isConnected()) ? AvailabilityType.UP
-        //    : AvailabilityType.DOWN;
-        return AvailabilityType.UP;
+        try {
+            EmsConnection conn = getEmsConnection();
+            if (conn == null) {
+                return AvailabilityType.DOWN;
+            }
+            conn.queryBeans("java.lang:*"); // just make a request over the connection to make sure its valid
+            return AvailabilityType.UP;
+        } catch (Throwable t) {
+            try {
+                this.connection.close(); // try to clean up
+            } catch (Throwable ignore) {
+            }
+            this.connection = null;
+            return AvailabilityType.DOWN;
+        }
Comment 4 John Mazzitelli 2013-10-07 14:38:26 EDT
*** Bug 997601 has been marked as a duplicate of this bug. ***
Comment 5 John Mazzitelli 2013-10-07 15:18:43 EDT
Interesting. I let the server and agent just run for a while and eventually the resources went green! I don't know if there is some sort of timeout in teh JBossAS client code that eventually cleans up bad connections or what, but it did seem to self-correct, but not in a timely fashion. However, it DID clean up.

I'm gonna run another test and watch this carefully to see when exactly it self-corrects.
Comment 6 John Mazzitelli 2013-10-07 16:10:59 EDT
ok, this fix looks like its working. But because the availability checks for these resources (since they are services) occur every 10 minutes, that's why it takes a while for the resource to go green again.

So it appears we do have this workaround for the JBossAS client bugs - I'll commit this code to master soon.
Comment 7 John Mazzitelli 2013-10-07 17:20:58 EDT
git commit to master: 1558f53

Note You need to log in before you can comment on or make changes to this bug.