Bug 1119932 - [RFE] Verify network connectivity from Engine to hosts to enhance the fencing logic
Summary: [RFE] Verify network connectivity from Engine to hosts to enhance the fencing...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.6.0
Assignee: Martin Perina
QA Contact:
URL:
Whiteboard: infra
Depends On: 1117931
Blocks: 1097923 1110176
TreeView+ depends on / blocked
 
Reported: 2014-07-15 21:44 UTC by Scott Herold
Modified: 2016-02-10 19:17 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 1117931
Environment:
Last Closed: 2015-04-27 12:30:14 UTC
oVirt Team: Infra
sherold: Triaged+


Attachments (Terms of Use)

Description Scott Herold 2014-07-15 21:44:22 UTC
+++ This bug was initially created as a clone of Bug #1117931 +++

Description of problem:

This RFE is part of the request is to introduce logic in the fencing workflow for the engine to determine if an inability to communicate with external hosts is because it is having network connectivity issues or if there is a legitimate problem with the remote host.

As a first phase, the user should be able to list IP addresses to which we should try to ping before fencing a host. Engine should constantly ping the addresses provided and collect the ping results as an ongoing process. In order to better determine the connectivity status, the results from the last 15 seconds should be kept as a backlog for reference. This timer should be user-configurable. 

We still need to discuss what are our 'success' criteria, i.e., how many pings should we send and how many should get a reply.

Comment 1 Scott Herold 2014-07-15 21:58:26 UTC
Infra/Engine portion of engine network connectivity validation for fencing storms.

When Triggered
--------------
This action is triggered once RHEV-M had made the decision that a target host may need to be fenced, but prior to a fence command being sent to a proxy host.  In this flow, before sending the instruction for a proxy host to fence a target host, the engine will first validate whether it has "acceptable" network connectivity.  This will be performed by checking ICMP status to user definable external IPs such as the default Gateway or other external reliable node.  If one of these "reliable ICMP nodes" are unavailable, the engine will temporarily suspend fencing commands for the specified host or cluster.  This will prevent fencing storms from leaving the engine, and will avoid potential race conditions on fence retries as experienced in the Engine Network Port Flapping use case.

UX
--
There will be an option in the Fencing Policy sub menu (Defined by BZ 1118879) to configure the following option:
"Disrupt fence request if engine network connectivity test fails"

Default: DISABLED

Behavior to Enable: The user is provided with a configuration dialog to enable up to X number of external "Reliable IPs" (3-5 max - TBD) for ICMP validation.  If one of these ICMP targets fails the tests specified in BZ 1117931, fencing logic will not continue, and will never leave the host.

Comment 2 Scott Herold 2014-07-17 18:43:33 UTC
Targeting for 3.6 pending network implementation in BZ 1117931

Comment 3 Oved Ourfali 2015-04-27 12:30:14 UTC
Per past discussions with scott,

Closing this RFE as won't fix.


Note You need to log in before you can comment on or make changes to this bug.