Bug 1119932

Summary: [RFE] Verify network connectivity from Engine to hosts to enhance the fencing logic
Product: Red Hat Enterprise Virtualization Manager Reporter: Scott Herold <sherold>
Component: ovirt-engineAssignee: Martin Perina <mperina>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.4.0CC: gklein, howey.vernon, iheim, lpeer, myakove, nyechiel, oourfali, rbalakri, Rhev-m-bugs, yeylon
Target Milestone: ---Keywords: FutureFeature
Target Release: 3.6.0Flags: sherold: Triaged+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1117931 Environment:
Last Closed: 2015-04-27 12:30:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1117931    
Bug Blocks: 1097923, 1110176    

Description Scott Herold 2014-07-15 21:44:22 UTC
+++ This bug was initially created as a clone of Bug #1117931 +++

Description of problem:

This RFE is part of the request is to introduce logic in the fencing workflow for the engine to determine if an inability to communicate with external hosts is because it is having network connectivity issues or if there is a legitimate problem with the remote host.

As a first phase, the user should be able to list IP addresses to which we should try to ping before fencing a host. Engine should constantly ping the addresses provided and collect the ping results as an ongoing process. In order to better determine the connectivity status, the results from the last 15 seconds should be kept as a backlog for reference. This timer should be user-configurable. 

We still need to discuss what are our 'success' criteria, i.e., how many pings should we send and how many should get a reply.

Comment 1 Scott Herold 2014-07-15 21:58:26 UTC
Infra/Engine portion of engine network connectivity validation for fencing storms.

When Triggered
--------------
This action is triggered once RHEV-M had made the decision that a target host may need to be fenced, but prior to a fence command being sent to a proxy host.  In this flow, before sending the instruction for a proxy host to fence a target host, the engine will first validate whether it has "acceptable" network connectivity.  This will be performed by checking ICMP status to user definable external IPs such as the default Gateway or other external reliable node.  If one of these "reliable ICMP nodes" are unavailable, the engine will temporarily suspend fencing commands for the specified host or cluster.  This will prevent fencing storms from leaving the engine, and will avoid potential race conditions on fence retries as experienced in the Engine Network Port Flapping use case.

UX
--
There will be an option in the Fencing Policy sub menu (Defined by BZ 1118879) to configure the following option:
"Disrupt fence request if engine network connectivity test fails"

Default: DISABLED

Behavior to Enable: The user is provided with a configuration dialog to enable up to X number of external "Reliable IPs" (3-5 max - TBD) for ICMP validation.  If one of these ICMP targets fails the tests specified in BZ 1117931, fencing logic will not continue, and will never leave the host.

Comment 2 Scott Herold 2014-07-17 18:43:33 UTC
Targeting for 3.6 pending network implementation in BZ 1117931

Comment 3 Oved Ourfali 2015-04-27 12:30:14 UTC
Per past discussions with scott,

Closing this RFE as won't fix.