Bug 879719 - power management: Fence Host fails if something went wrong in FenceQuietTimeBetweenOperationsInSec window [180seconds]
Summary: power management: Fence Host fails if something went wrong in FenceQuietTimeB...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.1.1
Assignee: Eli Mesika
QA Contact: Tareq Alayan
URL:
Whiteboard: virt, infra
Depends On: 871768
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-23 22:05 UTC by Chris Pelland
Modified: 2013-01-15 15:12 UTC (History)
15 users (show)

Fixed In Version: si25
Doc Type: Bug Fix
Doc Text:
Previously, if something went wrong in the FenceQuietTimeBetweenOperationsInSec window, the host would be unresponsive unless manually restarted. Now, FenceVdsBaseCommand no longer checks QuietTimeBetweenPmOperations if the command was invoked by the system and the host goes up as intended.
Clone Of: 871768
Environment:
Last Closed: 2013-01-15 15:12:16 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0003 0 normal SHIPPED_LIVE rhevm bug fix update 2013-01-15 20:11:48 UTC

Description Chris Pelland 2012-11-23 22:05:04 UTC
+++ This bug was initially created as a clone of Bug #871768 +++

Description of problem:
The problem is that we will stay with unresponsive host forever unless manually restarted.

Version-Release number of selected component (if applicable):
si22.1

Steps to Reproduce:
1. Assume you have 2 hosts aqua1, aqua2
2. Restart aqua1 via power management [Result: aqua1 is rebooted and up again within 90sec]
3. VDSMD on aqua1 crashed or stopped. [Result: aqua2 will send pmCommand reboot to aqua1]
The reboot attempt will fail because 180sec didn't pass yet [FenceQuietTimeBetweenOperationsInSec=180sec]
  
Actual results:
aqua1 is unresposive and vdsmd is down

Expected results:
Consider to send 2nd or 3rd reboot attempt to make sure the other host is up

--- Additional comment from Tareq Alayan on 2012-10-31 06:59:54 EDT ---

Created attachment 636019 [details]
engin.log

--- Additional comment from Eli Mesika on 2012-11-13 07:01:15 EST ---

http://gerrit.ovirt.org/#/c/9211/1

--- Additional comment from Eli Mesika on 2012-11-20 04:37:58 EST ---

fixed at commit : cb564a3

Comment 3 Tareq Alayan 2012-12-18 13:47:23 UTC
couldn't reproduce.

The proxy has successfully fence the the non operational host in the 180 seconds window. 
And the host goes up.

Comment 5 errata-xmlrpc 2013-01-15 15:12:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0003.html


Note You need to log in before you can comment on or make changes to this bug.