The plugin update is problematic as it is driven by listener ordering and possibly has many race conditions like seen in Bug 1025844. For example, the listener is actually added before the plugin container starts. Although effectively, the dependent systems are initialized before be the listener is called, this is not something predictable or clear. The other weird thing is a lot of polling (of directory states, etc.) for a count of files. There is no notion of completed state, i.e. the update actually completed. I've seen agents come up with the old version of plugins (like if the server blocked for a very long time), etc. Using marker files isn't terribly reliable either, and not really helpful. Maybe it is a diagnostic feature, but why not use a static semaphore to order updates? Anyway, my patch is simply having AgentMain do (more or less): new PluginUpdate(...).updatePlugins() when the server starts.
Created attachment 827463 [details] Patch on c39bcd8571 Let me clarify the actual problem I have seen: Plugins are actually in the process of being downloaded, while the plugin container is being started up. The way to reproduce this is, if there are existing plugins, but new ones are being downloaded, and it takes about 30-40 seconds, then the plugin container will continue being initialized without the plugins. Looking at the logic, I'm not sure how this happens. The behavior should be the same after, except I changed the management method to throw an exception if the agent is not connected. It simply would queue the request once the server reconnected. Having it synchronous makes more sense as a management method anyway, since you can see if it worked or not.
Fixed in master commit fd21e3a42153c9e205afc5a25418f2970c6e5c7d Author: Elias Ross <genman> Date: Thu Nov 28 18:04:46 2013 +0100
I would like to kick this back into ON_DEV because there is something not right with agent startup and I think its because of this fix. Everytime I run "rhqctl install --start" to get my stuff installed, I get barraged with agent error log messages at startup. I get agent.log.1 and agent.log.2 always created, each maxed out at 5MB a piece, and they all have this same message in it: ERROR [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.plugin-update-failure}Failed to update the plugins.. Cause: java.lang.IllegalStateException: The sender object is currently not sending commands now. Command not sent: [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[getLatestPlugins], targetInterfaceName=org.rhq.core.clientapi.server.core.CoreServerService}]] We need to figure out how to slow down the agent here at startup, I think its trying to shotgun "getLatestPlugins" over and over again until the server is finally able to give it the plugins. The agent should some how try it once, if it fails, sleep for like X seconds (where X is something like 10 or 30 or whatever), and then try again. Otherwise, I think you'll get what I'm seeing, a barrage of getLatestPlugins getting sent and failing.
I did not notice before merging because it does not happen when you install without '--start' option. Looking...
*** Bug 1038021 has been marked as a duplicate of this bug. ***
Fixed in master commit dd6200e8a52d9d28ed4a57a8a1bff60b77e4a9d2 Author: Thomas Segismont <tsegismo> Date: Fri Dec 6 16:18:53 2013 +0100 Make a pause between plugin update attempts I did not see the problem when I merged the patch because when I start a fresh dev-container, I delete agent preferences. So my agents does not try to update plugins until it has registered.
Bulk closing of 4.10 issues. If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.