Bug 1003797
Summary: | Failed to start component for RHQ Server Subsystems | ||||||
---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Ilya Maleev <imaleev> | ||||
Component: | Plugins | Assignee: | Nobody <nobody> | ||||
Status: | ON_QA --- | QA Contact: | |||||
Severity: | low | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.9 | CC: | genman, hrupp, mazz | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1005708 (view as bug list) | Environment: | |||||
Last Closed: | Type: | Bug | |||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1005708 | ||||||
Attachments: |
|
Probably a dup / related to Bug 997601 I tried to fix this by re-building the connection on error. However, now I'm seeing this, as reported in BZ against JBoss EAP - bug #900595 In MBeanResourceComponent in the JMX plugin: private boolean isMBeanAvailable() { EmsBean emsBean = getEmsBean(); boolean isAvailable = emsBean.isRegistered(); that isRegistered() call is what throws the exception. We need to get the connection to re-establish itself. However, there appear to be bugs in JBossAS/EAP in this area that doesn't make that trivial. I tried this in JBossAS7JMXComponent, but I then hit bug #900595 : @Override public AvailabilityType getAvailability() { - // TODO (jshaughn): Figure out why this hangs, it seems so innocuous :) - //EmsConnection conn = getEmsConnection(); - //ConnectionProvider connectionProvider = (null != conn) ? conn.getConnectionProvider() : null; - //return (null != connectionProvider && connectionProvider.isConnected()) ? AvailabilityType.UP - // : AvailabilityType.DOWN; - - return AvailabilityType.UP; + try { + EmsConnection conn = getEmsConnection(); + if (conn == null) { + return AvailabilityType.DOWN; + } + conn.queryBeans("java.lang:*"); // just make a request over the connection to make sure its valid + return AvailabilityType.UP; + } catch (Throwable t) { + try { + this.connection.close(); // try to clean up + } catch (Throwable ignore) { + } + this.connection = null; + return AvailabilityType.DOWN; + } *** Bug 997601 has been marked as a duplicate of this bug. *** Interesting. I let the server and agent just run for a while and eventually the resources went green! I don't know if there is some sort of timeout in teh JBossAS client code that eventually cleans up bad connections or what, but it did seem to self-correct, but not in a timely fashion. However, it DID clean up. I'm gonna run another test and watch this carefully to see when exactly it self-corrects. ok, this fix looks like its working. But because the availability checks for these resources (since they are services) occur every 10 minutes, that's why it takes a while for the resource to go green again. So it appears we do have this workaround for the JBossAS client bugs - I'll commit this code to master soon. git commit to master: 1558f53 |
Created attachment 793073 [details] agent.log Sometimes resources under the RHQ Server Subsystem become unavailable: Alert, Communications, Group Definition, Measurement, Remote API In the agent.log the following exception keeps on coming 2013-08-29 09:49:27,311 ERROR [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Failed to start component for Resource[id=11660, uuid=4c3dedf1-8311-48dc-8484-3903b90dcef1, type={RHQServer}RHQ Server Remote API Subsystem, key=rhq.remoting:type=RemoteApiMetrics, name=Remote API Subsystem, parent=RHQ Server Subsystems] from synchronized merge. org.rhq.core.clientapi.agent.PluginContainerException: Failed to start component for resource Resource[id=11660, uuid=4c3dedf1-8311-48dc-8484-3903b90dcef1, type={RHQServer}RHQ Server Remote API Subsystem, key=rhq.remoting:type=RemoteApiMetrics, name=Remote API Subsystem, parent=RHQ Server Subsystems]. at org.rhq.core.pc.inventory.InventoryManager.activateResource(InventoryManager.java:1831) at org.rhq.core.pc.inventory.InventoryManager.refreshResourceComponentState(InventoryManager.java:3226) at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2815) at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2821) at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2821) at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2821) at org.rhq.core.pc.inventory.InventoryManager.synchInventory(InventoryManager.java:1145) at org.rhq.core.pc.inventory.InventoryManager.synchInventory(InventoryManager.java:1115) at org.rhq.core.pc.inventory.InventoryManager.handleReport(InventoryManager.java:1097) at org.rhq.core.pc.inventory.RuntimeDiscoveryExecutor.call(RuntimeDiscoveryExecutor.java:129) at org.rhq.core.pc.inventory.RuntimeDiscoveryExecutor.call(RuntimeDiscoveryExecutor.java:64) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolE xecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor .java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.rhq.core.pc.inventory.TimeoutException: [Warning] Call to [org.rhq.plugins.server.RemoteAPIResourceComponent.start()] with args [[org.rhq.core.pluginapi.inventory.ResourceContext@71903dd9]] timed out after 60000 milliseconds - invocation thread will be interrupted. at org.rhq.core.clientapi.agent.PluginContainerException.wrapIfNecessary(PluginContainerException.java: 69) at org.rhq.core.clientapi.agent.PluginContainerException.<init>(PluginContainerException.java:96) ... 18 more