Bug 978737

Summary: [Docs] [Tech Ref] add soft fencing over SSH (restart VDSM) as a preliminary step before fencing a None-Responsive host
Product: Red Hat Enterprise Virtualization Manager Reporter: Andrew Burden <aburden>
Component: DocumentationAssignee: Zac Dover <zdover>
Status: CLOSED CURRENTRELEASE QA Contact: ecs-bugs
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: aburden, acathrow, alyoung, gklein, Rhev-m-bugs, thildred, yeylon, zdover
Target Milestone: ---Keywords: FutureFeature
Target Release: 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 975301 Environment:
Last Closed: 2014-04-07 03:06:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 967328, 975301    
Bug Blocks:    

Comment 1 Zac Dover 2013-07-15 02:33:39 UTC
This changes part of

RHEV 3.3 Tech Ref Chapter 5, "Power management and fencing".

Comment 2 Tim Hildred 2013-08-06 05:28:45 UTC
From BZ#975301, comment #1:
Looking at builds currently available, and based on Barak's comment from 2013-06-16 04:49:10 EDT, there will be no UI impact from this bug. 

If there is anything that changes in the Admin Guide, it would be the inclusion of a topic from the Technical Reference Guide called:

Soft Fencing Hosts in Red Hat Enterprise Virtualization

When that topic is written, we'll see if it is appropriate for inclusion in the Hosts Resilience section of the Administration Guide.

Comment 3 Zac Dover 2013-08-24 15:54:13 UTC
I have written the following:

Soft-fencing Hosts

Sometimes a host becomes non-responsive due to an unexpected problem, and though VDSM is unable to respond to requests, the virtual machines that depend upon VDSM remain alive and accessible. In these situations, simply restarting VDSM returns VDSM to a responsive state and resolves this issue.

Red Hat Enterprise Virtualization 3.3 introduces "soft-fencing over SSH". Prior to Red Hat Enterprise Virtualization 3.3, non-responsive hosts were fenced only by external fencing devices. In Red Hat Enterprise Virtualization 3.3, the fencing process has been expanded to include "SSH Soft Fencing", a process whereby the Manager (the engine) attempts to restart VDSM via SSH on non-responsive hosts; if the Manager fails to restart VDSM via SSH, the responsibility for fencing falls to the external fencing agent (if an external fencing agent has been configured).

SSH Soft Fencing works as follows. Fencing must be configured and enabled on the host, and a valid proxy host (a second host, in an UP state, in the data center) must exist. When the connection between the engine (the Manager) and the host times out, the following happens. On the first network failure, the status of the host changes to "connecting". The engine (the Manager) then does one of two things: it makes three attempts to ask VDSM for its status, or it waits for an interval determined by the host's load. The formula for determining the length of the interval is configured by the the configuration values TimeoutToResetVdsInSeconds (the deafult is 60 seconds) + [DelayResetPerVmInSeconds (the default is 0.5 seconds)]*(the count of running vms on host) + [DelayResetForSpmInSeconds (the default is 20 seconds] * 1 (if host runs as SPM) or 0 (if the host does not run as SPM). In order to give VDSM the maximum amount of time to respond, the engine (the Manager) chooses the longer of the two options mentioned above (three attempts to retrieve the status of VDSM or the interval determined by the above formula). If the host doesn't respond when that interval has elapsed, <command>vdsm restart</command> is executed via SSH. If <command>vdsm restart</command> does not succeed in re-establishing the connection between the host and the manager, the status of the host changes to <literal>non responsive</literal> and, if power management is configured, fencing is handed off to the external fencing agent.

Note
SSH soft-fencing can be executed on hosts that have no power management configured. This is distinct from "fencing": fencing can be executed only on hosts that have power management configured.

Comment 5 Zac Dover 2013-09-25 01:20:10 UTC
Documentation Link
------------------
http://documentation-devel.engineering.redhat.com/docs/en-US/Red_Hat_Enterprise_Virtualization/3.3/html-single/Technical_Reference_Guide/index.html#Soft-Fencing_Hosts

What Changed
------------
I added the following content:

Sometimes a host becomes non-responsive due to an unexpected problem, and though VDSM is unable to respond to requests, the virtual machines that depend upon VDSM remain alive and accessible. In these situations, simply restarting VDSM returns VDSM to a responsive state and resolves this issue.
Red Hat Enterprise Virtualization 3.3 introduces "soft-fencing over SSH". Prior to Red Hat Enterprise Virtualization 3.3, non-responsive hosts were fenced only by external fencing devices. In Red Hat Enterprise Virtualization 3.3, the fencing process has been expanded to include "SSH Soft Fencing", a process whereby the Manager (the engine) attempts to restart VDSM via SSH on non-responsive hosts; if the Manager fails to restart VDSM via SSH, the responsibility for fencing falls to the external fencing agent (if an external fencing agent has been configured).
SSH soft-fencing works as follows. Fencing must be configured and enabled on the host, and a valid proxy host (a second host, in an UP state, in the data center) must exist. When the connection between the engine (the Manager) and the host times out, the following happens. On the first network failure, the status of the host changes to "connecting". The engine (the Manager) then does one of two things: it makes three attempts to ask VDSM for its status, or it waits for an interval determined by the host's load. The formula for determining the length of the interval is configured by the the configuration values TimeoutToResetVdsInSeconds (the deafult is 60 seconds) + [DelayResetPerVmInSeconds (the default is 0.5 seconds)]*(the count of running vms on host) + [DelayResetForSpmInSeconds (the default is 20 seconds] * 1 (if host runs as SPM) or 0 (if the host does not run as SPM). In order to give VDSM the maximum amount of time to respond, the engine (the Manager) chooses the longer of the two options mentioned above (three attempts to retrieve the status of VDSM or the interval determined by the above formula). If the host doesn't respond when that interval has elapsed, vdsm restart is executed via SSH. If vdsm restart does not succeed in re-establishing the connection between the host and the manager, the status of the host changes to non responsive and, if power management is configured, fencing is handed off to the external fencing agent.
Note
SSH soft-fencing can be executed on hosts that have no power management configured. This is distinct from "fencing": fencing can be executed only on hosts that have power management configured.

NVR
---
Red_Hat_Enterprise_Virtualization-Technical_Reference_Guide-3.3-en-US-3.3.0-004

Moving to ON_QA.