Bug 961753
Summary: | PRD35 - [RFE] Improve fencing robustness by retrying failed attempts | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Josep 'Pep' Turro Mauri <pep> |
Component: | ovirt-engine | Assignee: | Eli Mesika <emesika> |
Status: | CLOSED ERRATA | QA Contact: | sefi litmanovich <slitmano> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.3.0 | CC: | aberezin, bazulay, bsettle, iheim, lpeer, oourfali, pablo.iranzo, pstehlik, rbalakri, Rhev-m-bugs, sbonazzo, sherold, yeylon |
Target Milestone: | --- | Keywords: | FutureFeature |
Target Release: | 3.5.0 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | infra | ||
Fixed In Version: | ovirt-3.5.0-alpha1 | Doc Type: | Enhancement |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-02-11 17:53:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1090511 | ||
Bug Blocks: | 1142923, 1156165 |
Description
Josep 'Pep' Turro Mauri
2013-05-10 10:59:48 UTC
is this RFE just for the cases of: - badly configured PMs - or unreachable PMs if this is the case than it should be rather easy to do. When the fencing actually fails somewhere on the way (e.g after stop succeeded but start did not bring the host back to up ... after a few minutes), that is a different issue and will be harder to detect and implement. The "suggested implementation" (just an idea) of retrying - potentially from a different proxy host every time, is for the simple case of misconfigured/unreachable PMs: the idea is that maybe the selected proxy host has some connectivity problems with the victim's PM, but another host might work just fine; or it could just be a transient problem (seen at the customer's environment). However, the aim of this RFE is as broad as possible: make fencing as robust as possible, so ideally we would like to cover other scenarios too, like the one you describe of "partial success". Maybe split in 2 RFEs? One "easy win" with the simple scenario (and hopefully in a release soon in your download channels ;) and another for the more complicated case(s). So to be clear, badly configured PM will never succeed from any proxy. So we need to check if we can differentiate badly configured from unreachable and try to handle only the unreachable through other proxy. If we can't differentiate than the retries will take place for both above scenarios. It looks like a general configuration is in order (number of retries ?) (In reply to comment #4) > So to be clear, > > badly configured PM will never succeed from any proxy. > So we need to check if we can differentiate badly configured from > unreachable User should test configuration, we have the test button. This is use case is already solved. > and try to handle only the unreachable through other proxy. > If we can't differentiate than the retries will take place for both above > scenarios. See the answer above if the customer tool care of the first part and 'tested' the configuration then retries are a good approach > > It looks like a general configuration is in order (number of retries ?) Could be part of the proxy setting that we've added to 3.2. Meaning while you select the preferred proxy from the list, have another list allowing to select number of retries. Simplifying to : Add option in config tool to retry fencing - Number of retry attempts - timeout between retries The current mechanism of proxy selection has already a retry setting in configuration. What should be done to resolve this is that the host that is a proxy candidate will be tested for being a good proxy by issuing a STATUS fence command to the target host *** Bug 1061722 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0158.html |