Bug 1030063 - Clean up plugin update to work synchronously
Clean up plugin update to work synchronously
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Agent (Show other bugs)
4.9
Unspecified Unspecified
unspecified Severity unspecified (vote)
: ---
: RHQ 4.10
Assigned To: Thomas Segismont
Mike Foley
:
: 1038021 (view as bug list)
Depends On:
Blocks: 1038021
  Show dependency treegraph
 
Reported: 2013-11-13 15:48 EST by Elias Ross
Modified: 2014-04-23 08:31 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-04-23 08:31:37 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch on c39bcd8571 (19.64 KB, patch)
2013-11-21 17:07 EST, Elias Ross
no flags Details | Diff

  None (edit)
Description Elias Ross 2013-11-13 15:48:16 EST
The plugin update is problematic as it is driven by listener ordering and possibly has many race conditions like seen in Bug 1025844. For example, the listener is actually added before the plugin container starts. Although effectively, the dependent systems are initialized before be the listener is called, this is not something predictable or clear.

The other weird thing is a lot of polling (of directory states, etc.) for a count of files. There is no notion of completed state, i.e. the update actually completed.

I've seen agents come up with the old version of plugins (like if the server blocked for a very long time), etc.

Using marker files isn't terribly reliable either, and not really helpful. Maybe it is a diagnostic feature, but why not use a static semaphore to order updates?

Anyway, my patch is simply having AgentMain do (more or less):

   new PluginUpdate(...).updatePlugins()

when the server starts.
Comment 1 Elias Ross 2013-11-21 17:07:03 EST
Created attachment 827463 [details]
Patch on c39bcd8571

Let me clarify the actual problem I have seen:

Plugins are actually in the process of being downloaded, while the plugin container is being started up.

The way to reproduce this is, if there are existing plugins, but new ones are being downloaded, and it takes about 30-40 seconds, then the plugin container will continue being initialized without the plugins.

Looking at the logic, I'm not sure how this happens.

The behavior should be the same after, except I changed the management method to throw an exception if the agent is not connected. It simply would queue the request once the server reconnected. Having it synchronous makes more sense as a management method anyway, since you can see if it worked or not.
Comment 2 Thomas Segismont 2013-11-28 12:45:01 EST
Fixed in master

commit fd21e3a42153c9e205afc5a25418f2970c6e5c7d
Author: Elias Ross <genman@noderunner.net>
Date:   Thu Nov 28 18:04:46 2013 +0100
Comment 3 John Mazzitelli 2013-12-04 14:25:52 EST
I would like to kick this back into ON_DEV because there is something not right with agent startup and I think its because of this fix.

Everytime I run "rhqctl install --start" to get my stuff installed, I get barraged with agent error log messages at startup. I get agent.log.1 and agent.log.2 always created, each maxed out at 5MB a piece, and they all have this same message in it:

ERROR [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.plugin-update-failure}Failed to update the plugins.. Cause: java.lang.IllegalStateException: The sender object is currently not sending commands now. Command not sent: [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[getLatestPlugins], targetInterfaceName=org.rhq.core.clientapi.server.core.CoreServerService}]]

We need to figure out how to slow down the agent here at startup, I think its trying to shotgun "getLatestPlugins" over and over again until the server is finally able to give it the plugins. The agent should some how try it once, if it fails, sleep for like X seconds (where X is something like 10 or 30 or whatever), and then try again. Otherwise, I think you'll get what I'm seeing, a barrage of getLatestPlugins getting sent and failing.
Comment 4 Thomas Segismont 2013-12-05 09:36:44 EST
I did not notice before merging because it does not happen when you install without '--start' option.

Looking...
Comment 5 Thomas Segismont 2013-12-06 10:20:36 EST
*** Bug 1038021 has been marked as a duplicate of this bug. ***
Comment 6 Thomas Segismont 2013-12-06 10:22:25 EST
Fixed in master

commit dd6200e8a52d9d28ed4a57a8a1bff60b77e4a9d2
Author: Thomas Segismont <tsegismo@redhat.com>
Date:   Fri Dec 6 16:18:53 2013 +0100
    
Make a pause between plugin update attempts

I did not see the problem when I merged the patch because when I start a fresh dev-container, I delete agent preferences. So my agents does not try to update plugins until it has registered.
Comment 7 Heiko W. Rupp 2014-04-23 08:31:37 EDT
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.

Note You need to log in before you can comment on or make changes to this bug.