Red Hat Bugzilla – Full Text Bug Listing
|Summary:||clean up agent stuff in CoreServerService.agentIsShuttingDown|
|Product:||[Other] RHQ Project||Reporter:||John Mazzitelli <mazz>|
|Component:||Core Server||Assignee:||Jay Shaughnessy <jshaughn>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:|
|Version:||unspecified||CC:||jshaughn, loleary, tao|
|Target Milestone:||---||Keywords:||FutureFeature, Improvement|
|Target Release:||RHQ 4.4.0|
|Fixed In Version:||Doc Type:||Enhancement|
|Doc Text:||Story Points:||---|
|Last Closed:||2013-09-01 06:12:14 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:||535352|
Description John Mazzitelli 2008-11-24 20:20:00 EST
When an agent shuts down, it sends a message to the server saying it is going down. The message enters: CoreServerService#agentIsShuttingDown(String) We should make sure we do all the stuff we can do in here. For example, in the HAAC view of the servers, the agent count doesn't go down when an agent shuts down. We should ensure that the agent count goes down when the agent tells us it is shutting down. We could also clear the alert cache for that agent to lower the footprint of the cache.
Comment 1 Heiko W. Rupp 2009-03-10 13:11:43 EDT
pilhuhn: How much sense does this make: [17:58] pilhuhn: 17:54:52,486 INFO [AgentManagerBean] Agent with name [snert.home.bsd.de] just went down [17:58] pilhuhn: 17:56:04,431 WARN [AgentManagerBean] Have not heard from agent [snert.home.bsd.de] since [Tue Mar 10 17:54:02 CET 2009]. Will be backfilled since we suspect it is down [17:58] pilhuhn: first we get a message saying that we know that the agent is down [17:59] pilhuhn: and then 12secs later (and all over the place again) we say , that we did not hear from it and *suspect* that it is down [17:59] pilhuhn: I mean - the agent correctly said bye bye mazz: in this case, the agent went down, we could conceivably backfill immediately [18:03] pilhuhn: or just check if the agent correctly said good bye [18:03] pilhuhn: It did just tell us [18:04] mazz: http://jira.rhq-project.org/browse/RHQ-1178 [18:04] mazz: that's the jira I was talking about [18:05] mazz: but, we cannot rely on that message - because if the agent crashes (like a SIGAR crash:) it'll never get sent [18:05] mazz: so we still need the backfiller to check even if we haven't heard from the agent. [18:05] pilhuhn: mazz: Not rely on it - the other way around: shortcut when we see a correct good bye
Comment 2 John Mazzitelli 2009-09-15 15:23:04 EDT
we should at least trigger the backfiller so we make the platform and all its resources DOWN
Comment 3 Red Hat Bugzilla 2009-11-10 15:27:23 EST
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1178
Comment 4 wes hayutin 2010-02-16 12:09:00 EST
mass add of key word FutureFeature to help track
Comment 5 Larry O'Leary 2010-02-23 23:48:12 EST
Is identifying this as a feature really what we want to do? After all, the agent has shutdown gracefully but the JON server won't allow it. Instead, it assumes it is busy and doesn't show it and its resources as down until a timeout has occurred. Seems like this is a bug and a pretty major one.
Comment 6 Larry O'Leary 2010-03-11 01:17:26 EST
Created attachment 399254 [details] Patch to perform backfill when agent has executed shutdown command This patch only addresses the issue with the server continuing to report the last known availability of an agent and its resources after the agent has shutdown.
Comment 7 Larry O'Leary 2010-03-11 01:18:14 EST
Created attachment 399255 [details] Patch to perform backfill when agent has executed shutdown command This patch only addresses the issue with the server continuing to report the last known availability of an agent and its resources after the agent has shutdown.
Comment 8 John Mazzitelli 2011-07-26 15:02:40 EDT
if we do this, the agent should not hang its shutdown while waiting for the server to do its thing. we should make that coreserverservice API asynchronous (not guaranteed) and just let the agent fire-n-forget it. the server should backfill the agent (thus setting all its resouces to down) and we should clear the agent condition cache (which might already be getting done during the backfill).
Comment 9 Charles Crouch 2011-09-30 20:20:34 EDT
Comment 10 Jay Shaughnessy 2012-02-24 14:58:05 EST
This is implemented in the jshaughn/avail branch. Backfilling now occurs on agent notification. Note that in the latest revision backfilling availability is set to UNKNOWN as opposed to DOWN. This indicates that in fact we don't know what the real avail is when the agent is down. Also, contrary to the above note, the message is not sent asynchronously, as we need to ensure the agent/comm layer is up to guarantee the message is sent.
Comment 11 Jay Shaughnessy 2012-02-24 15:00:21 EST
See: http://rhq-project.org/display/RHQ/Design-Availability+Checking#Design-AvailabilityChecking-DesignandChanges For more on planned avail changes.
Comment 12 Jay Shaughnessy 2012-03-30 16:42:36 EDT
This is in master.
Comment 13 Heiko W. Rupp 2013-09-01 06:12:14 EDT
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.