This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 536581 - (RHQ-916) avoid pushing schedules to all agents at startup
avoid pushing schedules to all agents at startup
Product: RHQ Project
Classification: Other
Component: Performance (Show other bugs)
All All
high Severity medium (vote)
: ---
: ---
Assigned To: John Mazzitelli
Pavel Kralik
: Improvement
Depends On:
  Show dependency treegraph
Reported: 2008-10-01 23:01 EDT by John Mazzitelli
Modified: 2013-04-30 19:32 EDT (History)
1 user (show)

See Also:
Fixed In Version: 1.2
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:

Attachments (Terms of Use)

  None (edit)
Description John Mazzitelli 2008-10-01 23:01:00 EDT
At startup, I noticed that ResourceMetadataManager.updateMeasurementDefinitions ends up calling MeasurementScheduleManager.createSchedulesAndSendToAgents which attempts to ping all agents in the system and if the ping succeeds, it tries to push schedules to the agent.

We should avoid pushing to all agents at startup - this causes the server startup to take a long time. Need to come up with a way for all agents to update their schedules on their own time.
Comment 1 John Mazzitelli 2008-10-01 23:44:59 EDT
this is really critical, it looks like this happens inside of the tx of registerPlugin - which has a tx timeout of 10mins.
Comment 2 Greg Hinkle 2008-10-13 17:19:43 EDT
These definitely should not be pushed down like this. It looks like someone has removed the api in the DiscoveryAgentService that allowed the server to ask the agent to do an out of band update. Need to put this back and use that.
Comment 3 John Mazzitelli 2008-12-02 15:37:45 EST
This also has to be fixed along with this - we need plugin deployment to happen BEFORE agent comm is started, otherwise, agents waiting at the gate to register will get bad, obsolete plugin information when it wants to update plugins:

(3:29:39 PM) mazz: the agent clients need to be started AFTER the comm layer is up - because an agent client might send a message that triggers the agent to immediately send am msg to the server
(3:30:04 PM) mazz: josep1: look at rev 1010 of StartupServlet
(3:30:21 PM) mazz: you moved the product plugin start to AFTER the comm layer starting
(3:30:42 PM) josep1: my svn comment "first, PluginDeployer was (for some select plugins) executing before the AgentClients were ready, which wasn't ready because the comm services weren't loaded yet;
(3:30:42 PM) josep1: swtich the order that the services are loaded in StartupServlet; "
(3:31:02 PM) mazz: I don't get that
(3:31:13 PM) josep1: neither do i, but i'm sure i had a good reason for it
(3:31:26 PM) josep1: did the plugin deployer every do any comm?
(3:31:37 PM) josep1: talk to agetnclient for some reason
(3:31:43 PM) mazz: so you want the agent clients to start after the plugins are deployed or after they are?
(3:31:50 PM) mazz: no - they can't
(3:31:54 PM) mazz: its just metadata
(3:32:03 PM) mazz: there is no agent stuff happening in there
(3:32:16 PM) mazz: that's the part I don't get
(3:32:43 PM) mazz: plugin deployment should occur before agent clients start up
(3:32:50 PM) mazz: but definitely should happen before the comm layer starts up
(3:33:04 PM) josep1:
(3:33:10 PM) josep1: sendSchedulesToAgents
(3:33:28 PM) josep1: updating of measuremnt definitions
(3:33:36 PM) mazz: whoa... you mean plugin deployment sends agent messages?
(3:33:43 PM) mazz: that should not be, IMHO
(3:33:47 PM) josep1: i guess at the time i wanted the agent clients to be ready so the schedule updates would succeed
(3:33:54 PM) josep1: hey, i didn't write that code  ; )
(3:34:19 PM) josep1: and mazz, we discussed this a few weeks back, that you didn't like how that was done
(3:34:25 PM) josep1: i think there is another open jira, lemme look
(3:34:43 PM) josep1:
(3:34:44 PM) mazz: this is bad. because agents that are sitting waiting to register, will now immediately get in prior to the plugin deployments and will thus probably get obsolete plugin information
Comment 4 John Mazzitelli 2008-12-02 15:41:40 EST
StartupServlet needs to do this:

        startServerPluginContainer(); // before comm in case an agent wants to talk to it
Comment 5 John Mazzitelli 2009-01-16 12:53:25 EST
RHQ-1326 will remove all agent comm from plugin deployment code. this issue will ensure we put the ordering back the way it was in StartupServlet.
Comment 6 John Mazzitelli 2009-01-16 12:54:20 EST
RHQ-1370 has the job of refactoring the schedule update so the agents get their schedules synchronized properly.
Comment 7 John Mazzitelli 2009-02-09 11:29:23 EST
the simplest way to test this is to get a set of servers/agents up and running (all agents registered and with resources imported).

Then shutdown all the agents and all servers.

Now, restart the server. You should see the server startup with no lag time and the server should not be attempting to send any data at all to the agents. If you see the startup time of the server be fast (like when the agents were running) and you see no exceptions in the server log talking about failures to talk to agents, then this issue can be considered fixed (this issue stopped the server from talking to agents during its startup).
Comment 8 Pavel Kralik 2009-02-11 07:45:52 EST
Tested as specified above. The server does not talk to agents during its startup.

RHEL5.3, x86_64, PostgreSQL8.2.4, java 1.6.0_11, JON RHQ SVN rev# 2894 
Comment 9 Red Hat Bugzilla 2009-11-10 16:19:33 EST
This bug was previously known as
This bug is duplicated by RHQ-1186
This bug relates to RHQ-592
This bug relates to RHQ-1326
This bug relates to RHQ-1370

Note You need to log in before you can comment on or make changes to this bug.