Bug 1044091

Summary: In the event of a full host power outage (including fence devices) a user must wait 19 mins (3 x 3 minute timeouts + 10 minutes for the transaction reaper) until they can manually fence a host to relocate guests.
Product: Red Hat Enterprise Virtualization Manager Reporter: Lee Yarwood <lyarwood>
Component: ovirt-engineAssignee: Eli Mesika <emesika>
Status: CLOSED ERRATA QA Contact: Tareq Alayan <talayan>
Severity: high Docs Contact:
Priority: high    
Version: 3.2.0CC: acathrow, bazulay, emesika, flo_bugzilla, iheim, jentrena, lpeer, lyarwood, mgrac, pstehlik, Rhev-m-bugs, sputhenp, tpoitras, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: ovirt-3.4.0-alpha1 Doc Type: Bug Fix
Doc Text:
Previously, a full host power outage resulted in a 19 minute reconnection time before manual guest relocation could be performed. Now, a host in connecting state can be manually fenced.
Story Points: ---
Clone Of:
: 1052082 (view as bug list) Environment:
Last Closed: 2014-06-09 15:07:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1044088, 1052082, 1078909, 1142926    

Description Lee Yarwood 2013-12-17 19:19:58 UTC
Description of problem:
In the event of a full host power outage (including fence devices) a user must wait 19 mins (3 x 3 minute timeouts + 10 minutes for the transaction reaper) until they can manually fence a host to relocate guests.

Version-Release number of selected component (if applicable):
rhevm-3.2.3-0.43.el6ev.noarch

How reproducible:
Always.

Steps to Reproduce:
1.  Remove all power to an active host, including any fence agents that are configured.
2.  Attempt to manually fence the host to relocated guests.

Actual results:
The guests are only relocated once the host has moved to a state of 'non-responsive'. This can take 19 minutes if the fencing is configured but not available.

Expected results:
The guests are relocated if the user confirms the host is down.

Additional info:

Comment 3 Itamar Heim 2013-12-18 11:03:12 UTC
lee - dup of bug 1044089?

Comment 4 Julio Entrena Perez 2013-12-18 11:09:25 UTC
(In reply to Itamar Heim from comment #3)
> lee - dup of bug 1044089?

Bug 1044089 is about allowing acknowledgment that a host has been rebooted to allow VMs in it to failover to remaining hosts while host is in "Connecting" status.

This bug 1044091 is about allowing acknowledgement that host has been rebooted to allow VMs in it to failover to remaining hosts while host is in "Reboot" status.

Lee, do you agree?

Comment 5 Eli Mesika 2013-12-18 15:08:19 UTC
Marek, is there a special status retuned from the fence-agents package when the agent power has been switched off as described in this BZ?
How can we distinguish that the PM agent card have no power so we can stop retrying the operation ?

Comment 6 Marek Grac 2013-12-18 16:45:28 UTC
@Eli:

if the agent can not do do a 'monitor' action that you can consider it is a dead one - we do not distinguish if it is problem with login/pass;firmware or power outage

Comment 7 Lee Yarwood 2013-12-18 19:21:33 UTC
(In reply to Julio Entrena Perez from comment #4)
> (In reply to Itamar Heim from comment #3)
> > lee - dup of bug 1044089?

No, I created BZ#1044089 as a manual fencing while the host is 'connecting' fails to failover the SPM role. This bug, BZ#1044091, was created as manual fencing fails to refresh/relocate guests while the host is 'connecting' or 'rebooting'. 

> Bug 1044089 is about allowing acknowledgment that a host has been rebooted
> to allow VMs in it to failover to remaining hosts while host is in
> "Connecting" status.

Nope, BZ#1044089 covers the failure to failover the SPM with a manual fence while the host is connecting.

Comment 9 Sandro Bonazzola 2014-01-14 08:42:44 UTC
ovirt 3.4.0 alpha has been released

Comment 10 Tareq Alayan 2014-02-17 11:22:32 UTC
verified tested on ovirt-engine-3.4.0-0.7.beta2.el6.noarch 
vdsm-4.14.1-3.el6.x86_64

Comment 12 errata-xmlrpc 2014-06-09 15:07:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html