Bug 1030063

Summary: Clean up plugin update to work synchronously
Product: [Other] RHQ Project Reporter: Elias Ross <genman>
Component: AgentAssignee: Thomas Segismont <tsegismo>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.9CC: hrupp, mazz, tsegismo
Target Milestone: ---   
Target Release: RHQ 4.10   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-23 12:31:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1038021    
Attachments:
Description Flags
Patch on c39bcd8571 none

Description Elias Ross 2013-11-13 20:48:16 UTC
The plugin update is problematic as it is driven by listener ordering and possibly has many race conditions like seen in Bug 1025844. For example, the listener is actually added before the plugin container starts. Although effectively, the dependent systems are initialized before be the listener is called, this is not something predictable or clear.

The other weird thing is a lot of polling (of directory states, etc.) for a count of files. There is no notion of completed state, i.e. the update actually completed.

I've seen agents come up with the old version of plugins (like if the server blocked for a very long time), etc.

Using marker files isn't terribly reliable either, and not really helpful. Maybe it is a diagnostic feature, but why not use a static semaphore to order updates?

Anyway, my patch is simply having AgentMain do (more or less):

   new PluginUpdate(...).updatePlugins()

when the server starts.

Comment 1 Elias Ross 2013-11-21 22:07:03 UTC
Created attachment 827463 [details]
Patch on c39bcd8571

Let me clarify the actual problem I have seen:

Plugins are actually in the process of being downloaded, while the plugin container is being started up.

The way to reproduce this is, if there are existing plugins, but new ones are being downloaded, and it takes about 30-40 seconds, then the plugin container will continue being initialized without the plugins.

Looking at the logic, I'm not sure how this happens.

The behavior should be the same after, except I changed the management method to throw an exception if the agent is not connected. It simply would queue the request once the server reconnected. Having it synchronous makes more sense as a management method anyway, since you can see if it worked or not.

Comment 2 Thomas Segismont 2013-11-28 17:45:01 UTC
Fixed in master

commit fd21e3a42153c9e205afc5a25418f2970c6e5c7d
Author: Elias Ross <genman>
Date:   Thu Nov 28 18:04:46 2013 +0100

Comment 3 John Mazzitelli 2013-12-04 19:25:52 UTC
I would like to kick this back into ON_DEV because there is something not right with agent startup and I think its because of this fix.

Everytime I run "rhqctl install --start" to get my stuff installed, I get barraged with agent error log messages at startup. I get agent.log.1 and agent.log.2 always created, each maxed out at 5MB a piece, and they all have this same message in it:

ERROR [main] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.plugin-update-failure}Failed to update the plugins.. Cause: java.lang.IllegalStateException: The sender object is currently not sending commands now. Command not sent: [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[getLatestPlugins], targetInterfaceName=org.rhq.core.clientapi.server.core.CoreServerService}]]

We need to figure out how to slow down the agent here at startup, I think its trying to shotgun "getLatestPlugins" over and over again until the server is finally able to give it the plugins. The agent should some how try it once, if it fails, sleep for like X seconds (where X is something like 10 or 30 or whatever), and then try again. Otherwise, I think you'll get what I'm seeing, a barrage of getLatestPlugins getting sent and failing.

Comment 4 Thomas Segismont 2013-12-05 14:36:44 UTC
I did not notice before merging because it does not happen when you install without '--start' option.

Looking...

Comment 5 Thomas Segismont 2013-12-06 15:20:36 UTC
*** Bug 1038021 has been marked as a duplicate of this bug. ***

Comment 6 Thomas Segismont 2013-12-06 15:22:25 UTC
Fixed in master

commit dd6200e8a52d9d28ed4a57a8a1bff60b77e4a9d2
Author: Thomas Segismont <tsegismo>
Date:   Fri Dec 6 16:18:53 2013 +0100
    
Make a pause between plugin update attempts

I did not see the problem when I merged the patch because when I start a fresh dev-container, I delete agent preferences. So my agents does not try to update plugins until it has registered.

Comment 7 Heiko W. Rupp 2014-04-23 12:31:37 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.