Bug 534375 (RHQ-1178)

Summary: clean up agent stuff in CoreServerService.agentIsShuttingDown
Product: [Other] RHQ Project Reporter: John Mazzitelli <mazz>
Component: Core ServerAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: jshaughn, loleary, tao
Target Milestone: ---Keywords: FutureFeature, Improvement
Target Release: RHQ 4.4.0   
Hardware: All   
OS: All   
URL: http://jira.rhq-project.org/browse/RHQ-1178
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-01 10:12:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 535352    
Bug Blocks: 741450    
Attachments:
Description Flags
Patch to perform backfill when agent has executed shutdown command
none
Patch to perform backfill when agent has executed shutdown command none

Description John Mazzitelli 2008-11-25 01:20:00 UTC
When an agent shuts down, it sends a message to the server saying it is going down.

The message enters: CoreServerService#agentIsShuttingDown(String)

We should make sure we do all the stuff we can do in here. For example, in the HAAC view of the servers, the agent count doesn't go down when an agent shuts down. We should ensure that the agent count goes down when the agent tells us it is shutting down.

We could also clear the alert cache for that agent to lower the footprint of the cache.

Comment 1 Heiko W. Rupp 2009-03-10 17:11:43 UTC
pilhuhn: How much sense does this make:
[17:58] pilhuhn: 17:54:52,486 INFO  [AgentManagerBean] Agent with name [snert.home.bsd.de] just went down
[17:58] pilhuhn: 17:56:04,431 WARN  [AgentManagerBean] Have not heard from agent [snert.home.bsd.de] since [Tue Mar 10 17:54:02 CET 2009]. Will be backfilled since we suspect it is down
[17:58] pilhuhn: first we get a message saying that we know that the agent is down
[17:59] pilhuhn: and then 12secs later (and all over the place again) we say , that we did not hear from it and *suspect* that it is down
[17:59] pilhuhn: I mean - the agent correctly said bye bye
mazz: in this case, the agent went down, we could conceivably backfill immediately
[18:03] pilhuhn: or just check if the agent correctly said good bye
[18:03] pilhuhn: It did just tell us
[18:04] mazz: http://jira.rhq-project.org/browse/RHQ-1178
[18:04] mazz: that's the jira I was talking about
[18:05] mazz: but, we cannot rely on that message - because if the agent crashes (like a SIGAR crash:) it'll never get sent
[18:05] mazz: so we still need the backfiller to check even if we haven't heard from the agent.
[18:05] pilhuhn: mazz: Not rely on it - the other way around: shortcut when we see a correct good bye

Comment 2 John Mazzitelli 2009-09-15 19:23:04 UTC
we should at least trigger the backfiller so we make the platform and all its resources DOWN

Comment 3 Red Hat Bugzilla 2009-11-10 20:27:23 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1178


Comment 4 wes hayutin 2010-02-16 17:09:00 UTC
mass add of key word FutureFeature to help track

Comment 5 Larry O'Leary 2010-02-24 04:48:12 UTC
Is identifying this as a feature really what we want to do?  After all, the agent has shutdown gracefully but the JON server won't allow it.  Instead, it assumes it is busy and doesn't show it and its resources as down until a timeout has occurred.  Seems like this is a bug and a pretty major one.

Comment 6 Larry O'Leary 2010-03-11 06:17:26 UTC
Created attachment 399254 [details]
Patch to perform backfill when agent has executed shutdown command

This patch only addresses the issue with the server continuing to report the last known availability of an agent and its resources after the agent has shutdown.

Comment 7 Larry O'Leary 2010-03-11 06:18:14 UTC
Created attachment 399255 [details]
Patch to perform backfill when agent has executed shutdown command

This patch only addresses the issue with the server continuing to report the last known availability of an agent and its resources after the agent has shutdown.

Comment 8 John Mazzitelli 2011-07-26 19:02:40 UTC
if we do this, the agent should not hang its shutdown while waiting for the server to do its thing. we should make that coreserverservice API asynchronous (not guaranteed) and just let the agent fire-n-forget it.

the server should backfill the agent (thus setting all its resouces to down) and we should clear the agent condition cache (which might already be getting done during the backfill).

Comment 9 Charles Crouch 2011-10-01 00:20:34 UTC
FutureFeature Improvement

Comment 10 Jay Shaughnessy 2012-02-24 19:58:05 UTC
This is implemented in the jshaughn/avail branch.

Backfilling now occurs on agent notification.  Note that in the latest
revision backfilling availability is set to UNKNOWN as opposed to
DOWN. This indicates that in fact we don't know what the real
avail is when the agent is down.

Also, contrary to the above note, the message is not sent asynchronously,
as we need to ensure the agent/comm layer is up to guarantee the message
is sent.

Comment 11 Jay Shaughnessy 2012-02-24 20:00:21 UTC
See:

http://rhq-project.org/display/RHQ/Design-Availability+Checking#Design-AvailabilityChecking-DesignandChanges

For more on planned avail changes.

Comment 12 Jay Shaughnessy 2012-03-30 20:42:36 UTC
This is in master.

Comment 13 Heiko W. Rupp 2013-09-01 10:12:14 UTC
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.