Bug 978737 - [Docs] [Tech Ref] add soft fencing over SSH (restart VDSM) as a preliminary step before fencing a None-Responsive host
Summary: [Docs] [Tech Ref] add soft fencing over SSH (restart VDSM) as a preliminary s...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: Documentation
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.3.0
Assignee: Zac Dover
QA Contact: ecs-bugs
URL:
Whiteboard: infra
Depends On: 967328 975301
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-27 06:29 UTC by Andrew Burden
Modified: 2016-02-10 19:08 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 975301
Environment:
Last Closed: 2014-04-07 03:06:36 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 15797 0 None None None Never
oVirt gerrit 15798 0 None None None Never

Comment 1 Zac Dover 2013-07-15 02:33:39 UTC
This changes part of

RHEV 3.3 Tech Ref Chapter 5, "Power management and fencing".

Comment 2 Tim Hildred 2013-08-06 05:28:45 UTC
From BZ#975301, comment #1:
Looking at builds currently available, and based on Barak's comment from 2013-06-16 04:49:10 EDT, there will be no UI impact from this bug. 

If there is anything that changes in the Admin Guide, it would be the inclusion of a topic from the Technical Reference Guide called:

Soft Fencing Hosts in Red Hat Enterprise Virtualization

When that topic is written, we'll see if it is appropriate for inclusion in the Hosts Resilience section of the Administration Guide.

Comment 3 Zac Dover 2013-08-24 15:54:13 UTC
I have written the following:

Soft-fencing Hosts

Sometimes a host becomes non-responsive due to an unexpected problem, and though VDSM is unable to respond to requests, the virtual machines that depend upon VDSM remain alive and accessible. In these situations, simply restarting VDSM returns VDSM to a responsive state and resolves this issue.

Red Hat Enterprise Virtualization 3.3 introduces "soft-fencing over SSH". Prior to Red Hat Enterprise Virtualization 3.3, non-responsive hosts were fenced only by external fencing devices. In Red Hat Enterprise Virtualization 3.3, the fencing process has been expanded to include "SSH Soft Fencing", a process whereby the Manager (the engine) attempts to restart VDSM via SSH on non-responsive hosts; if the Manager fails to restart VDSM via SSH, the responsibility for fencing falls to the external fencing agent (if an external fencing agent has been configured).

SSH Soft Fencing works as follows. Fencing must be configured and enabled on the host, and a valid proxy host (a second host, in an UP state, in the data center) must exist. When the connection between the engine (the Manager) and the host times out, the following happens. On the first network failure, the status of the host changes to "connecting". The engine (the Manager) then does one of two things: it makes three attempts to ask VDSM for its status, or it waits for an interval determined by the host's load. The formula for determining the length of the interval is configured by the the configuration values TimeoutToResetVdsInSeconds (the deafult is 60 seconds) + [DelayResetPerVmInSeconds (the default is 0.5 seconds)]*(the count of running vms on host) + [DelayResetForSpmInSeconds (the default is 20 seconds] * 1 (if host runs as SPM) or 0 (if the host does not run as SPM). In order to give VDSM the maximum amount of time to respond, the engine (the Manager) chooses the longer of the two options mentioned above (three attempts to retrieve the status of VDSM or the interval determined by the above formula). If the host doesn't respond when that interval has elapsed, <command>vdsm restart</command> is executed via SSH. If <command>vdsm restart</command> does not succeed in re-establishing the connection between the host and the manager, the status of the host changes to <literal>non responsive</literal> and, if power management is configured, fencing is handed off to the external fencing agent.

Note
SSH soft-fencing can be executed on hosts that have no power management configured. This is distinct from "fencing": fencing can be executed only on hosts that have power management configured.

Comment 5 Zac Dover 2013-09-25 01:20:10 UTC
Documentation Link
------------------
http://documentation-devel.engineering.redhat.com/docs/en-US/Red_Hat_Enterprise_Virtualization/3.3/html-single/Technical_Reference_Guide/index.html#Soft-Fencing_Hosts

What Changed
------------
I added the following content:

Sometimes a host becomes non-responsive due to an unexpected problem, and though VDSM is unable to respond to requests, the virtual machines that depend upon VDSM remain alive and accessible. In these situations, simply restarting VDSM returns VDSM to a responsive state and resolves this issue.
Red Hat Enterprise Virtualization 3.3 introduces "soft-fencing over SSH". Prior to Red Hat Enterprise Virtualization 3.3, non-responsive hosts were fenced only by external fencing devices. In Red Hat Enterprise Virtualization 3.3, the fencing process has been expanded to include "SSH Soft Fencing", a process whereby the Manager (the engine) attempts to restart VDSM via SSH on non-responsive hosts; if the Manager fails to restart VDSM via SSH, the responsibility for fencing falls to the external fencing agent (if an external fencing agent has been configured).
SSH soft-fencing works as follows. Fencing must be configured and enabled on the host, and a valid proxy host (a second host, in an UP state, in the data center) must exist. When the connection between the engine (the Manager) and the host times out, the following happens. On the first network failure, the status of the host changes to "connecting". The engine (the Manager) then does one of two things: it makes three attempts to ask VDSM for its status, or it waits for an interval determined by the host's load. The formula for determining the length of the interval is configured by the the configuration values TimeoutToResetVdsInSeconds (the deafult is 60 seconds) + [DelayResetPerVmInSeconds (the default is 0.5 seconds)]*(the count of running vms on host) + [DelayResetForSpmInSeconds (the default is 20 seconds] * 1 (if host runs as SPM) or 0 (if the host does not run as SPM). In order to give VDSM the maximum amount of time to respond, the engine (the Manager) chooses the longer of the two options mentioned above (three attempts to retrieve the status of VDSM or the interval determined by the above formula). If the host doesn't respond when that interval has elapsed, vdsm restart is executed via SSH. If vdsm restart does not succeed in re-establishing the connection between the host and the manager, the status of the host changes to non responsive and, if power management is configured, fencing is handed off to the external fencing agent.
Note
SSH soft-fencing can be executed on hosts that have no power management configured. This is distinct from "fencing": fencing can be executed only on hosts that have power management configured.

NVR
---
Red_Hat_Enterprise_Virtualization-Technical_Reference_Guide-3.3-en-US-3.3.0-004

Moving to ON_QA.


Note You need to log in before you can comment on or make changes to this bug.