Description of problem: When a Host that has PM configured becomes non-responsive we will try to restart it using its PM agent. In order to do that we need a Host in the system that can function as a proxy to the fencing commands sent to the problematic Host. Current proxy search implementation is looking for the first Host that is in UP status in problematic Host DC. We should look for the first Host that is in UP status in problematic Host DC and if not found try to take any other Host since the Host may be in other statuses as Maintenance but still can serve as a proxy. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.Have 2 Hosts in a DC host1 & host2 2.Configure and test PM for host1 3.Activate host1 4.Put host2 in Maintenance 5.Try to restart host1 from the UI Actual results: You get a failure on canDoAction since you have no Host proxy in UP staus Expected results: host2 should be used to fence host1 even it is in Maintenance Additional info:
http://gerrit.ovirt.org/#/c/9251/
fixed in commit: ca10cc1
verified.
This bug is currently attached to errata RHEA-2013:14491. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0888.html
Eli, I've seen the algorithm to be using hosts in 'Maintenance', this could also mean hosts that are not reachable (because of other operations). I'm not sure if those hosts are elected as the last ones available, but I think that they should be lowered in the list of host available for doing fence operations against anothers. In my setup, I've 6 hypervisors in two clusters, and when enabling one of the hypervisors with power management, it was never starting, because it first tried hosts in the same cluster (which I moved to hosts in DC using the ordered list), and all hosts in the cluster were in maintenance. Should I raise an RFE for this? Thanks, Pablo
Currently we are trying to get a proxy that is in UP first in the same cluster, if this fails ww will try to get any other host except those who have network errors , if this also fails , the same is done for the host DC You may open a RFE for that
Eli, Created as 1061722 Thanks! Pablo