Bug 536645 (RHQ-974)
Summary: | do not attempt to failover to another server for calls that must talk to the same server | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | John Mazzitelli <mazz> |
Component: | Communications Subsystem | Assignee: | RHQ Project Maintainer <rhq-maint> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 1.1 | CC: | cwelton, jshaughn, mazz |
Target Milestone: | --- | Keywords: | Improvement |
Target Release: | --- | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://jira.rhq-project.org/browse/RHQ-974 | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-05-16 15:25:21 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
John Mazzitelli
2008-10-10 05:17:00 UTC
another thing to possibly help survive something like this. create a core-comm-api non-runtime exception like "AbortedException" and any method that is annotated with @NotFailoverable should consider having a "throws AbortedException". All code that calls these methods would then be forced to handle it. And to handle it, all that you might have to do is call the API again (because hopefully by that time, the agent has switched over to another server). Mazz, I'm not sure I fully understand the "having the agent switch over to another server is futile" line. If a server goes down then failover isn't futile, right? Just processing that command is futile. Wouldn't we still want the agent to failover in general. @NoFailover (a better name, I think :) could throw the NotProcessedException, which we already have and is indicative of the problem, I think. It's a RuntimeException but could that be sufficient? This could then trigger failover logic at the agent level as opposed to the command processing layer. I think this is basically what you already propose. I would like to try to get something into 1.2 to address this problem. At the very least we should make downloading the plugins more tolerant of this error... perhaps just perform a retry before saying the download failed? the download-plugin problem has been addressed by making failures a bit more fault tolerant (we now retry if a plugin download fails). i am putting this down from critical to minor but leaving this open because I suspect we may have to come revisit this - this same kind of problem is going to happen when we need to stream package bits to and from the server so we still may need a solution. This bug was previously known as http://jira.rhq-project.org/browse/RHQ-974 This bug relates to RHQ-1069 Mazz, what are your current thoughts on this in re: bundles/streaming stuff? we still need to address this - but its not high priority. Still feel like we need to do this? (In reply to Jay Shaughnessy from comment #8) > Still feel like we need to do this? This is still a problem for things like streaming as the description says. However, even if we don't switch over, what's the point? The server we are talking to is dead, so unless its an async message with guaranteed delivery, it will still fail. Not switching over just means we'll try to keep talking to the downed server, and thus a failure will results. I think we can close this - in either situation a failure will occur and our exception handling will just take care of dealing with the error condition. |