Created attachment 878923 [details] Engine log with lines of fencing Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Setup Power Management type ilo4 and insert login details 2. Disable Host to cause fencing/HA to happen 3. Check logging of engine.log Actual results: Node stays down and the logging shows that it used ipmilan as agenttype. Expected results: The power management to see that the host is down and executes a fence command on a agent type ip_ilo4. Additional info: Sometimes I get the error No Route to host(failed with error VDS_NETWORK_ERROR and error 5022. But with the fence_ilo4 command I do get a success.
Created attachment 878926 [details] Screenshot of power management
Created attachment 878927 [details] Screenshot with error
Why do you open 2 BZs for the (it seams)same error? Duplicate of: https://bugzilla.redhat.com/show_bug.cgi?id=1080874
This is by design , the ilo4 is implemented as ipmilan and there is no problem with that. from the log you have attached : --snip-- 2014-03-25 13:28:49,734 INFO [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-8) Using Host node2.localdomain from cluster Default as proxy to execute Status command on Host 2014-03-25 13:28:49,736 INFO [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-8) Executing <Status> Power Management command, Proxy Host:node2.localdomain, Agent:ipmilan, Target Host:, Management IP:192.168.1.81, User:root, Options:lanplus,power_wait=4 2014-03-25 13:28:49,759 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (ajp--127.0.0.1-8702-8) START, FenceVdsVDSCommand(HostName = node2.localdomain, HostId = 22230782-a788-4b99-bc01-656c703e7b8c, targetVdsId = 39fa5e3d-3ea4-4eb4-bb06-778829724125, action = Status, ip = 192.168.1.81, port = , type = ipmilan, user = root, password = ******, options = 'lanplus,power_wait=4'), log id: b5c75c 2014-03-25 13:28:50,412 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (ajp--127.0.0.1-8702-8) FINISH, FenceVdsVDSCommand, return: Test Succeeded, on, log id: b5c75c --snip-- That means the the status command works OK and your PM definitions works. You have here several problems 1) You have network exception that prevents the soft-fencing (vdsm restart) to take place , therefor we are going to hard fencing 2) when getting to hard fencing the PM is defined but not enabled on he fenced host (In the UI Edit/New Host , PM TAB you have a check-box to enable PM) from the log --snip-- 2014-03-25 13:07:28,965 INFO [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-6-thread-47) [29c0b495] Lock Acquired to object EngineLock [exclusiveLocks= key: 22230782-a788-4b99-bc01-656c703e7b8c value: VDS_FENCE , sharedLocks= ] 2014-03-25 13:07:28,994 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-6-thread-47) [29c0b495] Failed to run Fence script on vds:node2.localdomain, VMs moved to UnKnown instead. 2014-03-25 13:07:28,998 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6-thread-47) [29c0b495] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host node2.localdomain became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted" 2014-03-25 13:07:28,999 WARN [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-6-thread-47) [29c0b495] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VAR__ACTION__RESTART,VDS_FENCE_DISABLED 2014-03-25 13:07:29,000 INFO [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-6-thread-47) [29c0b495] Lock freed to object EngineLock [exclusiveLocks= key: 22230782-a788-4b99-bc01-656c703e7b8c value: VDS_FENCE --snip-- This means that at the time that the host was non-responding its PM was disabled 3) I see that you had attached 2 TEST pics , one succeeded and one failed with network error. Can you please report the number of Hosts in the Cluster & in the DC at the time that this test failed ? We had an issue when DOWN hosts were taken in account as proxy candidates (https://bugzilla.redhat.com/show_bug.cgi?id=1073896) If this is the case , there is already a patch fixing that http://gerrit.ovirt.org/#/c/26096/ As I said, nothing to do with the ilo4 that mapped to ipmilan which is OK Please recheck your settings and try again , I tend to close that as NOTABUG
(In reply to Eli Mesika from comment #4) > This is by design , the ilo4 is implemented as ipmilan and there is no > problem with that. > > from the log you have attached : > > --snip-- > 2014-03-25 13:28:49,734 INFO [org.ovirt.engine.core.bll.FenceExecutor] > (ajp--127.0.0.1-8702-8) Using Host node2.localdomain from cluster Default as > proxy to execute Status command on Host > 2014-03-25 13:28:49,736 INFO [org.ovirt.engine.core.bll.FenceExecutor] > (ajp--127.0.0.1-8702-8) Executing <Status> Power Management command, Proxy > Host:node2.localdomain, Agent:ipmilan, Target Host:, Management > IP:192.168.1.81, User:root, Options:lanplus,power_wait=4 > 2014-03-25 13:28:49,759 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] > (ajp--127.0.0.1-8702-8) START, FenceVdsVDSCommand(HostName = > node2.localdomain, HostId = 22230782-a788-4b99-bc01-656c703e7b8c, > targetVdsId = 39fa5e3d-3ea4-4eb4-bb06-778829724125, action = Status, ip = > 192.168.1.81, port = , type = ipmilan, user = root, password = ******, > options = 'lanplus,power_wait=4'), log id: b5c75c > 2014-03-25 13:28:50,412 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] > (ajp--127.0.0.1-8702-8) FINISH, FenceVdsVDSCommand, return: Test Succeeded, > on, log id: b5c75c > --snip-- > > That means the the status command works OK and your PM definitions works. > > You have here several problems > > 1) You have network exception that prevents the soft-fencing (vdsm restart) > to take place , therefor we are going to hard fencing > > 2) when getting to hard fencing the PM is defined but not enabled on he > fenced host (In the UI Edit/New Host , PM TAB you have a check-box to enable > PM) > > from the log > > --snip-- > 2014-03-25 13:07:28,965 INFO > [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] > (pool-6-thread-47) [29c0b495] Lock Acquired to object EngineLock > [exclusiveLocks= key: 22230782-a788-4b99-bc01-656c703e7b8c value: VDS_FENCE > , sharedLocks= ] > 2014-03-25 13:07:28,994 ERROR > [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] > (pool-6-thread-47) [29c0b495] Failed to run Fence script on > vds:node2.localdomain, VMs moved to UnKnown instead. > 2014-03-25 13:07:28,998 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (pool-6-thread-47) [29c0b495] Correlation ID: null, Call Stack: null, Custom > Event ID: -1, Message: Host node2.localdomain became non responsive. It has > no power management configured. Please check the host status, manually > reboot it, and click "Confirm Host Has Been Rebooted" > 2014-03-25 13:07:28,999 WARN > [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] > (pool-6-thread-47) [29c0b495] CanDoAction of action > VdsNotRespondingTreatment failed. > Reasons:VAR__ACTION__RESTART,VDS_FENCE_DISABLED > 2014-03-25 13:07:29,000 INFO > [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] > (pool-6-thread-47) [29c0b495] Lock freed to object EngineLock > [exclusiveLocks= key: 22230782-a788-4b99-bc01-656c703e7b8c value: VDS_FENCE > --snip-- > > This means that at the time that the host was non-responding its PM was > disabled > > 3) I see that you had attached 2 TEST pics , one succeeded and one failed > with network error. Can you please report the number of Hosts in the Cluster > & in the DC at the time that this test failed ? > > We had an issue when DOWN hosts were taken in account as proxy candidates > (https://bugzilla.redhat.com/show_bug.cgi?id=1073896) > If this is the case , there is already a patch fixing that > http://gerrit.ovirt.org/#/c/26096/ > > As I said, nothing to do with the ilo4 that mapped to ipmilan which is OK > > Please recheck your settings and try again , I tend to close that as NOTABUG 1. What do you want me to do? 2. If you look at the screenshots then you will see that I do have PM enabled. If it's not defined in the logging then there's a problem isn't there. 3. If there was a patch in fixing that issue then I guess that's the case here. I have host in a cluster and in the DC. But with that patch resolving the error it still does not explain the other problem.
(In reply to jvleur from comment #5) > 1. What do you want me to do? First of all to give me the exact information of how many Hosts do you have in he cluster and in the DC when the test failed and what is the status of each host at the time the failure occurs , a screen-shot of the Hosts tab in the web admin UI is enough > > 2. If you look at the screen-shots then you will see that I do have PM > enabled. If it's not defined in the logging then there's a problem isn't > there. I just had looked in the log and found some VDS_FENCE_DISABLED messages that are certainly related to a disabled PM , however , it might be that it was disabled and enabled again later on since I see some calls that pass that point > > 3. If there was a patch in fixing that issue then I guess that's the case > here. I have host in a cluster and in the DC. But with that patch resolving > the error it still does not explain the other problem. The patch solved the case when you are trying to fence a host when all other hosts are in DOWN state, prior to that patch we tried to use DOWN host as a proxy and get a NETWORK error message as you got , after this patch will be merged you will get a more friendly message on failing to get a proxy host to perform he fencing operation. Please provide the info in 1) and we can move FW
I've got 2 Hosts in a cluster(and DC). The screenshots show that the VM is so called on a live node and not migrating. The error does occur when one of the Node is down so you were right about that. attachment: screenshot
Created attachment 879408 [details] Hosts
Created attachment 879409 [details] VM's
Closing as DUPLICATE according to reporter info in comment 7 *** This bug has been marked as a duplicate of bug 1073896 ***