Bug 534375 (RHQ-1178)
Summary: | clean up agent stuff in CoreServerService.agentIsShuttingDown | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | John Mazzitelli <mazz> | ||||||
Component: | Core Server | Assignee: | Jay Shaughnessy <jshaughn> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | unspecified | CC: | jshaughn, loleary, tao | ||||||
Target Milestone: | --- | Keywords: | FutureFeature, Improvement | ||||||
Target Release: | RHQ 4.4.0 | ||||||||
Hardware: | All | ||||||||
OS: | All | ||||||||
URL: | http://jira.rhq-project.org/browse/RHQ-1178 | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Enhancement | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-09-01 10:12:14 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 535352 | ||||||||
Bug Blocks: | 741450 | ||||||||
Attachments: |
|
Description
John Mazzitelli
2008-11-25 01:20:00 UTC
pilhuhn: How much sense does this make: [17:58] pilhuhn: 17:54:52,486 INFO [AgentManagerBean] Agent with name [snert.home.bsd.de] just went down [17:58] pilhuhn: 17:56:04,431 WARN [AgentManagerBean] Have not heard from agent [snert.home.bsd.de] since [Tue Mar 10 17:54:02 CET 2009]. Will be backfilled since we suspect it is down [17:58] pilhuhn: first we get a message saying that we know that the agent is down [17:59] pilhuhn: and then 12secs later (and all over the place again) we say , that we did not hear from it and *suspect* that it is down [17:59] pilhuhn: I mean - the agent correctly said bye bye mazz: in this case, the agent went down, we could conceivably backfill immediately [18:03] pilhuhn: or just check if the agent correctly said good bye [18:03] pilhuhn: It did just tell us [18:04] mazz: http://jira.rhq-project.org/browse/RHQ-1178 [18:04] mazz: that's the jira I was talking about [18:05] mazz: but, we cannot rely on that message - because if the agent crashes (like a SIGAR crash:) it'll never get sent [18:05] mazz: so we still need the backfiller to check even if we haven't heard from the agent. [18:05] pilhuhn: mazz: Not rely on it - the other way around: shortcut when we see a correct good bye we should at least trigger the backfiller so we make the platform and all its resources DOWN This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1178 mass add of key word FutureFeature to help track Is identifying this as a feature really what we want to do? After all, the agent has shutdown gracefully but the JON server won't allow it. Instead, it assumes it is busy and doesn't show it and its resources as down until a timeout has occurred. Seems like this is a bug and a pretty major one. Created attachment 399254 [details]
Patch to perform backfill when agent has executed shutdown command
This patch only addresses the issue with the server continuing to report the last known availability of an agent and its resources after the agent has shutdown.
Created attachment 399255 [details]
Patch to perform backfill when agent has executed shutdown command
This patch only addresses the issue with the server continuing to report the last known availability of an agent and its resources after the agent has shutdown.
if we do this, the agent should not hang its shutdown while waiting for the server to do its thing. we should make that coreserverservice API asynchronous (not guaranteed) and just let the agent fire-n-forget it. the server should backfill the agent (thus setting all its resouces to down) and we should clear the agent condition cache (which might already be getting done during the backfill). FutureFeature Improvement This is implemented in the jshaughn/avail branch. Backfilling now occurs on agent notification. Note that in the latest revision backfilling availability is set to UNKNOWN as opposed to DOWN. This indicates that in fact we don't know what the real avail is when the agent is down. Also, contrary to the above note, the message is not sent asynchronously, as we need to ensure the agent/comm layer is up to guarantee the message is sent. See: http://rhq-project.org/display/RHQ/Design-Availability+Checking#Design-AvailabilityChecking-DesignandChanges For more on planned avail changes. This is in master. Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since. |