Bug 1044092

Summary: FenceVdsVDSCommand successfully completes even if the underlying fence action to stop, start or reboot a host fails.
Product: Red Hat Enterprise Virtualization Manager Reporter: Lee Yarwood <lyarwood>
Component: ovirt-engineAssignee: Eli Mesika <emesika>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 3.2.0CC: acathrow, emesika, flo_bugzilla, iheim, lpeer, lyarwood, pstehlik, Rhev-m-bugs, sputhenp, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-19 09:52:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1044088    

Description Lee Yarwood 2013-12-17 19:22:18 UTC
Description of problem:
FenceVdsVDSCommand successfully completes even if the underlying fence action to stop, start or reboot a host fails. 

Version-Release number of selected component (if applicable):
rhevm-3.2.3-0.43.el6ev.noarch

How reproducible:
Always.

Steps to Reproduce:
1. Remove all power to an active host, including any fence agents that are configured.
2. Engine will attempt to fence the host in question.
3. The first fence action to stop power will succeed even if the fence command executed by the proxy host fails.

Actual results:
FenceVdsVDSCommand completes successfully.

Expected results:
FenceVdsVDSCommand should fail and reflect the failure of the fence command on the proxy.

Additional info:

Comment 4 Eli Mesika 2013-12-18 09:42:00 UTC
please specify what do exactly do in (taken from BZ description)

1. Remove all power to an active host, including any fence agents that are configured.

I need exact steps in UI/API in order to reproduce.

NOTE:

You wrote :
***************
Reading the code we don't actually wait around for the return value when calling the off action thus incorrectly report everything as fine when you can see a return code of 1 above.
***************

Yes, the vdsm fenceNode is fire & forget , so success is returned on being able to run the corresponding /usr/sbin/fence_<agent> script , we track the operation success by pooling the Host status via the PM agent

Comment 5 Eli Mesika 2013-12-18 09:43:14 UTC
please specify what do exactly do in (taken from BZ description)

1. Remove all power to an active host, including any fence agents that are configured.

I need exact steps in UI/API in order to reproduce.

NOTE:

You wrote :
***************
Reading the code we don't actually wait around for the return value when calling the off action thus incorrectly report everything as fine when you can see a return code of 1 above.
***************

Yes, the vdsm fenceNode is fire & forget , so success is returned on being able to run the corresponding /usr/sbin/fence_<agent> script , we track the operation success by pooling the Host status via the PM agent

Comment 6 Julio Entrena Perez 2013-12-18 09:45:10 UTC
(In reply to Eli Mesika from comment #4)
> please specify what do exactly do in (taken from BZ description)
> 
> 1. Remove all power to an active host, including any fence agents that are
> configured.
> 
> I need exact steps in UI/API in order to reproduce.
> 
Just pull all power cords from the server.
This will remove power from the server but also from the embedded IPMI fencing device (e.g. HP iLO) thus preventing other hosts from reaching the fence device.