Bug 1080905 - Power Management set as ilo4 but executes ipmilan
Summary: Power Management set as ilo4 but executes ipmilan
Keywords:
Status: CLOSED DUPLICATE of bug 1073896
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Eli Mesika
QA Contact: Pavel Stehlik
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-26 09:57 UTC by jvleur
Modified: 2014-03-27 11:50 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-03-27 11:50:20 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)
Engine log with lines of fencing (3.09 MB, text/plain)
2014-03-26 09:57 UTC, jvleur
no flags Details
Screenshot of power management (118.11 KB, image/png)
2014-03-26 09:58 UTC, jvleur
no flags Details
Screenshot with error (153.96 KB, image/png)
2014-03-26 09:59 UTC, jvleur
no flags Details
Hosts (471.41 KB, image/png)
2014-03-27 10:52 UTC, jvleur
no flags Details
VM's (96.11 KB, image/png)
2014-03-27 10:53 UTC, jvleur
no flags Details

Description jvleur 2014-03-26 09:57:05 UTC
Created attachment 878923 [details]
Engine log with lines of fencing

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Setup Power Management type ilo4 and insert login details
2. Disable Host to cause fencing/HA to happen
3. Check logging of engine.log

Actual results:

Node stays down and the logging shows that it used ipmilan as agenttype.

Expected results:

The power management to see that the host is down and executes a fence command on a agent type ip_ilo4.

Additional info:

Sometimes I get the error No Route to host(failed with error VDS_NETWORK_ERROR and error 5022. But with the fence_ilo4 command I do get a success.

Comment 1 jvleur 2014-03-26 09:58:33 UTC
Created attachment 878926 [details]
Screenshot of power management

Comment 2 jvleur 2014-03-26 09:59:05 UTC
Created attachment 878927 [details]
Screenshot with error

Comment 3 Sven Kieske 2014-03-26 10:29:11 UTC
Why do you open 2 BZs for the (it seams)same error?

Duplicate of:

https://bugzilla.redhat.com/show_bug.cgi?id=1080874

Comment 4 Eli Mesika 2014-03-27 09:31:37 UTC
This is by design , the ilo4 is implemented as ipmilan and there is no problem with that.

from the log you have attached :

--snip--
2014-03-25 13:28:49,734 INFO  [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-8) Using Host node2.localdomain from cluster Default as proxy to execute Status command on Host 
2014-03-25 13:28:49,736 INFO  [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-8) Executing <Status> Power Management command, Proxy Host:node2.localdomain, Agent:ipmilan, Target Host:, Management IP:192.168.1.81, User:root, Options:lanplus,power_wait=4
2014-03-25 13:28:49,759 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (ajp--127.0.0.1-8702-8) START, FenceVdsVDSCommand(HostName = node2.localdomain, HostId = 22230782-a788-4b99-bc01-656c703e7b8c, targetVdsId = 39fa5e3d-3ea4-4eb4-bb06-778829724125, action = Status, ip = 192.168.1.81, port = , type = ipmilan, user = root, password = ******, options = 'lanplus,power_wait=4'), log id: b5c75c
2014-03-25 13:28:50,412 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (ajp--127.0.0.1-8702-8) FINISH, FenceVdsVDSCommand, return: Test Succeeded, on, log id: b5c75c
--snip--

That means the the status command works OK and your PM definitions works.

You have here several problems 

1) You have network exception that prevents the soft-fencing (vdsm restart) to take place , therefor we are going to hard fencing

2) when getting to hard fencing the PM is defined but not enabled on he fenced host (In the UI Edit/New Host , PM TAB you have a check-box to enable PM)

from the log 

--snip--
2014-03-25 13:07:28,965 INFO  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-6-thread-47) [29c0b495] Lock Acquired to object EngineLock [exclusiveLocks= key: 22230782-a788-4b99-bc01-656c703e7b8c value: VDS_FENCE
, sharedLocks= ]
2014-03-25 13:07:28,994 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-6-thread-47) [29c0b495] Failed to run Fence script on vds:node2.localdomain, VMs moved to UnKnown instead.
2014-03-25 13:07:28,998 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6-thread-47) [29c0b495] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host node2.localdomain became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted"
2014-03-25 13:07:28,999 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-6-thread-47) [29c0b495] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VAR__ACTION__RESTART,VDS_FENCE_DISABLED
2014-03-25 13:07:29,000 INFO  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-6-thread-47) [29c0b495] Lock freed to object EngineLock [exclusiveLocks= key: 22230782-a788-4b99-bc01-656c703e7b8c value: VDS_FENCE
--snip--

This means that at the time that the host was non-responding its PM was disabled 

3) I see that you had attached 2 TEST pics , one succeeded and one failed with network error. Can you please report the number of Hosts in the Cluster & in the DC at the time that this test failed ?

We had an issue when DOWN hosts were taken in account as proxy candidates 
(https://bugzilla.redhat.com/show_bug.cgi?id=1073896)
If this is the case , there is already a patch fixing that 
http://gerrit.ovirt.org/#/c/26096/

As I said, nothing to do with the ilo4 that mapped to ipmilan which is OK

Please recheck your settings and try again , I tend to close that as NOTABUG

Comment 5 jvleur 2014-03-27 09:44:00 UTC
(In reply to Eli Mesika from comment #4)
> This is by design , the ilo4 is implemented as ipmilan and there is no
> problem with that.
> 
> from the log you have attached :
> 
> --snip--
> 2014-03-25 13:28:49,734 INFO  [org.ovirt.engine.core.bll.FenceExecutor]
> (ajp--127.0.0.1-8702-8) Using Host node2.localdomain from cluster Default as
> proxy to execute Status command on Host 
> 2014-03-25 13:28:49,736 INFO  [org.ovirt.engine.core.bll.FenceExecutor]
> (ajp--127.0.0.1-8702-8) Executing <Status> Power Management command, Proxy
> Host:node2.localdomain, Agent:ipmilan, Target Host:, Management
> IP:192.168.1.81, User:root, Options:lanplus,power_wait=4
> 2014-03-25 13:28:49,759 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand]
> (ajp--127.0.0.1-8702-8) START, FenceVdsVDSCommand(HostName =
> node2.localdomain, HostId = 22230782-a788-4b99-bc01-656c703e7b8c,
> targetVdsId = 39fa5e3d-3ea4-4eb4-bb06-778829724125, action = Status, ip =
> 192.168.1.81, port = , type = ipmilan, user = root, password = ******,
> options = 'lanplus,power_wait=4'), log id: b5c75c
> 2014-03-25 13:28:50,412 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand]
> (ajp--127.0.0.1-8702-8) FINISH, FenceVdsVDSCommand, return: Test Succeeded,
> on, log id: b5c75c
> --snip--
> 
> That means the the status command works OK and your PM definitions works.
> 
> You have here several problems 
> 
> 1) You have network exception that prevents the soft-fencing (vdsm restart)
> to take place , therefor we are going to hard fencing
> 
> 2) when getting to hard fencing the PM is defined but not enabled on he
> fenced host (In the UI Edit/New Host , PM TAB you have a check-box to enable
> PM)
> 
> from the log 
> 
> --snip--
> 2014-03-25 13:07:28,965 INFO 
> [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand]
> (pool-6-thread-47) [29c0b495] Lock Acquired to object EngineLock
> [exclusiveLocks= key: 22230782-a788-4b99-bc01-656c703e7b8c value: VDS_FENCE
> , sharedLocks= ]
> 2014-03-25 13:07:28,994 ERROR
> [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand]
> (pool-6-thread-47) [29c0b495] Failed to run Fence script on
> vds:node2.localdomain, VMs moved to UnKnown instead.
> 2014-03-25 13:07:28,998 INFO 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (pool-6-thread-47) [29c0b495] Correlation ID: null, Call Stack: null, Custom
> Event ID: -1, Message: Host node2.localdomain became non responsive. It has
> no power management configured. Please check the host status, manually
> reboot it, and click "Confirm Host Has Been Rebooted"
> 2014-03-25 13:07:28,999 WARN 
> [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand]
> (pool-6-thread-47) [29c0b495] CanDoAction of action
> VdsNotRespondingTreatment failed.
> Reasons:VAR__ACTION__RESTART,VDS_FENCE_DISABLED
> 2014-03-25 13:07:29,000 INFO 
> [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand]
> (pool-6-thread-47) [29c0b495] Lock freed to object EngineLock
> [exclusiveLocks= key: 22230782-a788-4b99-bc01-656c703e7b8c value: VDS_FENCE
> --snip--
> 
> This means that at the time that the host was non-responding its PM was
> disabled 
> 
> 3) I see that you had attached 2 TEST pics , one succeeded and one failed
> with network error. Can you please report the number of Hosts in the Cluster
> & in the DC at the time that this test failed ?
> 
> We had an issue when DOWN hosts were taken in account as proxy candidates 
> (https://bugzilla.redhat.com/show_bug.cgi?id=1073896)
> If this is the case , there is already a patch fixing that 
> http://gerrit.ovirt.org/#/c/26096/
> 
> As I said, nothing to do with the ilo4 that mapped to ipmilan which is OK
> 
> Please recheck your settings and try again , I tend to close that as NOTABUG

1. What do you want me to do?

2. If you look at the screenshots then you will see that I do have PM enabled. If it's not defined in the logging then there's a problem isn't there.

3. If there was a patch in fixing that issue then I guess that's the case here. I have host in a cluster and in the DC. But with that patch resolving the error it still does not explain the other problem.

Comment 6 Eli Mesika 2014-03-27 10:04:38 UTC
(In reply to jvleur from comment #5)
> 1. What do you want me to do?

First of all to give me the exact information of how many Hosts do you have in he cluster and in the DC when the test failed and what is the status of each host at the time the failure occurs , a screen-shot of the Hosts tab in the web admin UI is enough

> 
> 2. If you look at the screen-shots then you will see that I do have PM
> enabled. If it's not defined in the logging then there's a problem isn't
> there.

I just had looked in the log and found some VDS_FENCE_DISABLED messages that are certainly related to a disabled PM , however , it might be that it was disabled and enabled again later on since I see some calls that pass that point 

> 
> 3. If there was a patch in fixing that issue then I guess that's the case
> here. I have host in a cluster and in the DC. But with that patch resolving
> the error it still does not explain the other problem.

The patch solved the case when you are trying to fence a host when all other hosts are in DOWN state, prior to that patch we tried to use DOWN host as a proxy and get a NETWORK error message as you got , after this patch will be merged you will get a more friendly message on failing to get a proxy host to perform he fencing operation.
Please provide the info in 1) and we can move FW

Comment 7 jvleur 2014-03-27 10:52:14 UTC
I've got 2 Hosts in a cluster(and DC). The screenshots show that the VM is so called on a live node and not migrating. The error does occur when one of the Node is down so you were right about that.

attachment: screenshot

Comment 8 jvleur 2014-03-27 10:52:48 UTC
Created attachment 879408 [details]
Hosts

Comment 9 jvleur 2014-03-27 10:53:12 UTC
Created attachment 879409 [details]
VM's

Comment 10 Eli Mesika 2014-03-27 11:50:20 UTC
Closing as DUPLICATE according to reporter info in comment 7

*** This bug has been marked as a duplicate of bug 1073896 ***


Note You need to log in before you can comment on or make changes to this bug.