Bug 1044091 - In the event of a full host power outage (including fence devices) a user must wait 19 mins (3 x 3 minute timeouts + 10 minutes for the transaction reaper) until they can manually fence a host to relocate guests.
Summary: In the event of a full host power outage (including fence devices) a user mus...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.4.0
Assignee: Eli Mesika
QA Contact: Tareq Alayan
URL:
Whiteboard: infra
Depends On:
Blocks: 1044088 1052082 rhev3.4beta 1142926
TreeView+ depends on / blocked
 
Reported: 2013-12-17 19:19 UTC by Lee Yarwood
Modified: 2018-12-06 15:37 UTC (History)
14 users (show)

Fixed In Version: ovirt-3.4.0-alpha1
Doc Type: Bug Fix
Doc Text:
Previously, a full host power outage resulted in a 19 minute reconnection time before manual guest relocation could be performed. Now, a host in connecting state can be manually fenced.
Clone Of:
: 1052082 (view as bug list)
Environment:
Last Closed: 2014-06-09 15:07:46 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2014:0506 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update 2014-06-09 18:55:38 UTC
oVirt gerrit 22885 0 None None None Never

Description Lee Yarwood 2013-12-17 19:19:58 UTC
Description of problem:
In the event of a full host power outage (including fence devices) a user must wait 19 mins (3 x 3 minute timeouts + 10 minutes for the transaction reaper) until they can manually fence a host to relocate guests.

Version-Release number of selected component (if applicable):
rhevm-3.2.3-0.43.el6ev.noarch

How reproducible:
Always.

Steps to Reproduce:
1.  Remove all power to an active host, including any fence agents that are configured.
2.  Attempt to manually fence the host to relocated guests.

Actual results:
The guests are only relocated once the host has moved to a state of 'non-responsive'. This can take 19 minutes if the fencing is configured but not available.

Expected results:
The guests are relocated if the user confirms the host is down.

Additional info:

Comment 3 Itamar Heim 2013-12-18 11:03:12 UTC
lee - dup of bug 1044089?

Comment 4 Julio Entrena Perez 2013-12-18 11:09:25 UTC
(In reply to Itamar Heim from comment #3)
> lee - dup of bug 1044089?

Bug 1044089 is about allowing acknowledgment that a host has been rebooted to allow VMs in it to failover to remaining hosts while host is in "Connecting" status.

This bug 1044091 is about allowing acknowledgement that host has been rebooted to allow VMs in it to failover to remaining hosts while host is in "Reboot" status.

Lee, do you agree?

Comment 5 Eli Mesika 2013-12-18 15:08:19 UTC
Marek, is there a special status retuned from the fence-agents package when the agent power has been switched off as described in this BZ?
How can we distinguish that the PM agent card have no power so we can stop retrying the operation ?

Comment 6 Marek Grac 2013-12-18 16:45:28 UTC
@Eli:

if the agent can not do do a 'monitor' action that you can consider it is a dead one - we do not distinguish if it is problem with login/pass;firmware or power outage

Comment 7 Lee Yarwood 2013-12-18 19:21:33 UTC
(In reply to Julio Entrena Perez from comment #4)
> (In reply to Itamar Heim from comment #3)
> > lee - dup of bug 1044089?

No, I created BZ#1044089 as a manual fencing while the host is 'connecting' fails to failover the SPM role. This bug, BZ#1044091, was created as manual fencing fails to refresh/relocate guests while the host is 'connecting' or 'rebooting'. 

> Bug 1044089 is about allowing acknowledgment that a host has been rebooted
> to allow VMs in it to failover to remaining hosts while host is in
> "Connecting" status.

Nope, BZ#1044089 covers the failure to failover the SPM with a manual fence while the host is connecting.

Comment 9 Sandro Bonazzola 2014-01-14 08:42:44 UTC
ovirt 3.4.0 alpha has been released

Comment 10 Tareq Alayan 2014-02-17 11:22:32 UTC
verified tested on ovirt-engine-3.4.0-0.7.beta2.el6.noarch 
vdsm-4.14.1-3.el6.x86_64

Comment 12 errata-xmlrpc 2014-06-09 15:07:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html


Note You need to log in before you can comment on or make changes to this bug.