| Summary: | "Resource classLoader not found" error seen on upgraded agents, race condition with plugin download | ||
|---|---|---|---|
| Product: | [Other] RHQ Project | Reporter: | Elias Ross <genman> |
| Component: | Agent | Assignee: | Nobody <nobody> |
| Status: | NEW --- | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.9 | CC: | gerhard.dreschler, hrupp, jshaughn |
| Target Milestone: | --- | ||
| Target Release: | RHQ 4.13 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Note: this is with patch, Bug 991149 I'm guessing the issue is the agent waited a bit for the download to complete, saw that it started, but timed out, then the plugin container started with the plugins in the process of being downloaded. With a very busy server (maxed out on threads, for example), this can happen I suppose.
AgentMain.java:
boolean updating = PluginUpdate.waitForUpdateToComplete(pc_config, 30000L); // false
int after = plugin_dir.list().length;
if ((after == 0) && !updating) { // true && !false -> true
// still nothing and it doesn't look like we are downloading - try to update them again right now
// (doing this because I saw a case where the startup update somehow happened just prior to the
// registration finishing, so the original update was rejected by the server as "unauthorized")
updatePluginsNow(m_clientSender);
after = plugin_dir.list().length;
}
keep_waiting = ((after == 0) || (updating)); // evals false
I think it's pretty fragile to check for the size of a directory. It should be checking if the download process completed or not, not the state of the filesystem. Why can't the PluginUpdate object return the number of plugins to download and the number of plugins downloaded?
I'm actually thinking that it's not necessarily a plugin download race condition, but the initialization mess that I've addressed in Bug 991149 . ClassLoader resourceClassloader = resourceContainer.getResourceClassLoader(); if (resourceClassloader == null) { throw new PluginContainerException("Resource classLoader not found for " + resource + " - cannot create ResourceComponent."); } I need to trace this better, obviously. Seems like it has to do with the inventory manager having a sync issue with the server. I could reproduce the problem every time, except after uninv. the agent, the problem went away. Pushing to 4.11 Bump the target version now that 4.11 is out. Bumping again. Is this still an issue? |
Description of problem: Example log: 2013-10-31 06:35:37,795 ERROR [InventoryManager.discovery-1] (InventoryManager)- Failed to start component for Resource[id=172928, uuid=2cfa801a-b2a6-41a2-9e1 b-5ad4afacbeb8, type={RHQAgent}Launcher Script, key=launcherscript, name=RHQ Agent Launcher Script, parent=RHQ Agent, version=4.5.1] from synchronized merge. org.rhq.core.clientapi.agent.PluginContainerException: Could not build component for Resource [Resource[id=172928, uuid=2cfa801a-b2a6-41a2-9e1b-5ad4afacbeb8, type={RHQAgent}Launcher Script, key=launcherscript, name=RHQ Agent Launcher Script, parent=RHQ Agent, version=4.5.1]] at org.rhq.core.pc.inventory.InventoryManager.prepareResourceForActivation(InventoryManager.java:1815) at org.rhq.core.pc.inventory.InventoryManager.activateResource(InventoryManager.java:1892) at org.rhq.core.pc.inventory.InventoryManager.refreshResourceComponentState(InventoryManager.java:3360) at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2928) at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2935) at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2935) at org.rhq.core.pc.inventory.InventoryManager.synchInventory(InventoryManager.java:1231) at org.rhq.core.pc.inventory.InventoryManager.synchInventory(InventoryManager.java:1200) at org.rhq.core.pc.inventory.InventoryManager.handleReport(InventoryManager.java:1182) at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.call(AutoDiscoveryExecutor.java:130) at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.run(AutoDiscoveryExecutor.java:92) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.rhq.core.clientapi.agent.PluginContainerException: [Warning] Resource classLoader not found for Resource[id=172928, uuid=2cfa801a-b2a6-41a2-9e1b-5ad4afacbeb8, type={RHQAgent}Launcher Script, key=launcherscript, name=RHQ Agent Launcher Script, parent=RHQ Agent, version=4.5.1] - cannot create ResourceComponent. at org.rhq.core.clientapi.agent.PluginContainerException.wrapIfNecessary(PluginContainerException.java:69) at org.rhq.core.clientapi.agent.PluginContainerException.<init>(PluginContainerException.java:96) ... 20 more Seems like the container is starting resources before the plugins are updated: 2013-10-31 00:02:01,384 INFO [RHQ Server Polling Thread] (PluginUpdate)- {PluginUpdate.downloading}Downloading the plugin [rhq-snmp-plugin-5.0.2-SNAPSHOT.jar ]... 2013-10-31 00:02:01,774 ERROR [main] (InventoryManager)- Exception thrown while upgrading [Resource[id=172193, uuid=90aa3130-9cfe-453b-a24c-aab315af5d29, type ={RHQAgent}RHQ Agent, key=... RHQ Agent, name=RHQ Agent, parent=...., version=4.5.1]]. org.rhq.core.clientapi.agent.PluginContainerException: Could not build component for Resource [Resource[id=172193, uuid=90aa3130-9cfe-453b-a24c-aab315af5d29, type={RHQAgent}RHQ Agent, key=-... RHQ Agent, name=RHQ Agent, parent=..., version=4.5.1]] at org.rhq.core.pc.inventory.InventoryManager.prepareResourceForActivation(InventoryManager.java:1815) at org.rhq.core.pc.inventory.InventoryManager.activateAndUpgradeResourceRecursively(InventoryManager.java:3420) at org.rhq.core.pc.inventory.InventoryManager.activateAndUpgradeResourceRecursively(InventoryManager.java:3432) at org.rhq.core.pc.inventory.InventoryManager.activateAndUpgradeResources(InventoryManager.java:3391) at org.rhq.core.pc.inventory.InventoryManager.initialize(InventoryManager.java:252) at org.rhq.core.pc.PluginContainer.startContainerService(PluginContainer.java:515) at org.rhq.core.pc.PluginContainer.initialize(PluginContainer.java:328) at org.rhq.enterprise.agent.AgentMain.startPluginContainer(AgentMain.java:1913) at org.rhq.enterprise.agent.AgentMain.start(AgentMain.java:665) at org.rhq.enterprise.agent.AgentMain.main(AgentMain.java:434) Caused by: org.rhq.core.clientapi.agent.PluginContainerException: [Warning] Resource classLoader not found for Resource[id=172193, uuid=90aa3130-9cfe-453b-a24c-aab315af5d29, type={RHQAgent}RHQ Agent, key=... RHQ Agent, name=RHQ Agent, parent=..., version=4.5.1] - cannot create ResourceComponent. at org.rhq.core.clientapi.agent.PluginContainerException.wrapIfNecessary(PluginContainerException.java:69) at org.rhq.core.clientapi.agent.PluginContainerException.<init>(PluginContainerException.java:96) ... 10 more 2013-10-31 00:02:01,883 INFO [RHQ Server Polling Thread] (PluginUpdate)- {PluginUpdate.downloading-complete}The plugin [snmp] has been updated at [rhq-snmp-plugin-5.0.2-SNAPSHOT.jar]. To me it seems like a race condition going on here. Version-Release number of selected component (if applicable): 4.9 (upgraded from 4.5.1) How reproducible: Unclear. Has happened on a number of hosts. Steps to Reproduce: 1. Start agent in older version 2. Upgrade plugins 3. Restart agent talking to new server. Actual results: Exception above. Expected results: No exception. Additional info: May be simply a race condition for some hosts. Not clear why plugins are not updated first, then the container started.