Bug 536581 (RHQ-916) - avoid pushing schedules to all agents at startup
Summary: avoid pushing schedules to all agents at startup
Keywords:
Status: CLOSED NEXTRELEASE
Alias: RHQ-916
Product: RHQ Project
Classification: Other
Component: Performance
Version: 1.1
Hardware: All
OS: All
high
medium
Target Milestone: ---
: ---
Assignee: John Mazzitelli
QA Contact: Pavel Kralik
URL: http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-10-02 03:01 UTC by John Mazzitelli
Modified: 2013-04-30 23:32 UTC (History)
1 user (show)

Fixed In Version: 1.2
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description John Mazzitelli 2008-10-02 03:01:00 UTC
At startup, I noticed that ResourceMetadataManager.updateMeasurementDefinitions ends up calling MeasurementScheduleManager.createSchedulesAndSendToAgents which attempts to ping all agents in the system and if the ping succeeds, it tries to push schedules to the agent.

We should avoid pushing to all agents at startup - this causes the server startup to take a long time. Need to come up with a way for all agents to update their schedules on their own time.

Comment 1 John Mazzitelli 2008-10-02 03:44:59 UTC
this is really critical, it looks like this happens inside of the tx of registerPlugin - which has a tx timeout of 10mins.

Comment 2 Greg Hinkle 2008-10-13 21:19:43 UTC
These definitely should not be pushed down like this. It looks like someone has removed the api in the DiscoveryAgentService that allowed the server to ask the agent to do an out of band update. Need to put this back and use that.

Comment 3 John Mazzitelli 2008-12-02 20:37:45 UTC
This also has to be fixed along with this - we need plugin deployment to happen BEFORE agent comm is started, otherwise, agents waiting at the gate to register will get bad, obsolete plugin information when it wants to update plugins:

(3:29:39 PM) mazz: the agent clients need to be started AFTER the comm layer is up - because an agent client might send a message that triggers the agent to immediately send am msg to the server
...
(3:30:04 PM) mazz: josep1: look at rev 1010 of StartupServlet
(3:30:21 PM) mazz: you moved the product plugin start to AFTER the comm layer starting
(3:30:42 PM) josep1: my svn comment "first, PluginDeployer was (for some select plugins) executing before the AgentClients were ready, which wasn't ready because the comm services weren't loaded yet;
(3:30:42 PM) josep1: swtich the order that the services are loaded in StartupServlet; "
(3:31:02 PM) mazz: I don't get that
(3:31:13 PM) josep1: neither do i, but i'm sure i had a good reason for it
(3:31:26 PM) josep1: did the plugin deployer every do any comm?
(3:31:37 PM) josep1: talk to agetnclient for some reason
(3:31:43 PM) mazz: so you want the agent clients to start after the plugins are deployed or after they are?
(3:31:50 PM) mazz: no - they can't
(3:31:54 PM) mazz: its just metadata
(3:32:03 PM) mazz: there is no agent stuff happening in there
(3:32:16 PM) mazz: that's the part I don't get
(3:32:43 PM) mazz: plugin deployment should occur before agent clients start up
(3:32:50 PM) mazz: but definitely should happen before the comm layer starts up
(3:33:04 PM) josep1: http://jira.rhq-project.org/browse/RHQ-592
(3:33:10 PM) josep1: sendSchedulesToAgents
(3:33:28 PM) josep1: updating of measuremnt definitions
(3:33:36 PM) mazz: whoa... you mean plugin deployment sends agent messages?
(3:33:43 PM) mazz: that should not be, IMHO
(3:33:47 PM) josep1: i guess at the time i wanted the agent clients to be ready so the schedule updates would succeed
(3:33:54 PM) josep1: hey, i didn't write that code  ; )
(3:34:19 PM) josep1: and mazz, we discussed this a few weeks back, that you didn't like how that was done
(3:34:25 PM) josep1: i think there is another open jira, lemme look
(3:34:43 PM) josep1: http://jira.rhq-project.org/browse/RHQ-916
(3:34:44 PM) mazz: this is bad. because agents that are sitting waiting to register, will now immediately get in prior to the plugin deployments and will thus probably get obsolete plugin information


Comment 4 John Mazzitelli 2008-12-02 20:41:40 UTC
StartupServlet needs to do this:

        startHibernateStatistics();
         // PUT THIS HERE - NEEDS TO HAPPEN BEFORE comm AND BEFORE agent clients START
         // PLUGIN DEPLOYMENT MUST NOT TALK TO AGENTS - SHOULD  JUST BE METADATA PROCESSING
        startPluginDeployer();
        startServerPluginContainer(); // before comm in case an agent wants to talk to it
        installJaasModules();
        startServerCommunicationServices();
        startScheduler();
        scheduleJobs();
        startAgentClients();
        // THIS IS BAD - MOVE THIS BEFORE COMM
        //startPluginDeployer();
        startEmbeddedAgent();
        registerShutdownListener();


Comment 5 John Mazzitelli 2009-01-16 17:53:25 UTC
RHQ-1326 will remove all agent comm from plugin deployment code. this issue will ensure we put the ordering back the way it was in StartupServlet.

Comment 6 John Mazzitelli 2009-01-16 17:54:20 UTC
RHQ-1370 has the job of refactoring the schedule update so the agents get their schedules synchronized properly.

Comment 7 John Mazzitelli 2009-02-09 16:29:23 UTC
the simplest way to test this is to get a set of servers/agents up and running (all agents registered and with resources imported).

Then shutdown all the agents and all servers.

Now, restart the server. You should see the server startup with no lag time and the server should not be attempting to send any data at all to the agents. If you see the startup time of the server be fast (like when the agents were running) and you see no exceptions in the server log talking about failures to talk to agents, then this issue can be considered fixed (this issue stopped the server from talking to agents during its startup).

Comment 8 Pavel Kralik 2009-02-11 12:45:52 UTC
Tested as specified above. The server does not talk to agents during its startup.

RHEL5.3, x86_64, PostgreSQL8.2.4, java 1.6.0_11, JON RHQ SVN rev# 2894 

Comment 9 Red Hat Bugzilla 2009-11-10 21:19:33 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-916
This bug is duplicated by RHQ-1186
This bug relates to RHQ-592
This bug relates to RHQ-1326
This bug relates to RHQ-1370



Note You need to log in before you can comment on or make changes to this bug.