Bug 1467469

Summary: Coordinate fence test documentation and add a fence test section to fence configuration procedure
Product: Red Hat Enterprise Linux 7 Reporter: Keigo Noha <knoha>
Component: doc-High_Availability_Add-On_AdministrationAssignee: Steven J. Levine <slevine>
Status: CLOSED CURRENTRELEASE QA Contact: ecs-bugs
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: jruemker, kgaillot, knoha, rhel-docs, slevine
Target Milestone: rcKeywords: Documentation, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-12 16:36:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Keigo Noha 2017-07-04 01:15:35 UTC
Document URL: 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/s1-fenceconfig-HAAA.html

Section Number and Name: 
1.3. Fencing Configuration

Describe the issue: 
In this section, there is a way to configure the fencing. But there is no description for how to test it.

Some customers tend to do the network restart to confirm that the cluster doesn't affect the network restart because the network restart will not exceed the timeout in corosync.conf.
But corosync is monitoring the network interfaces used for monitoring other nodes.
Once the network devices are down, corosync will do recovery process(node restart).

So, the network restart while the corosync is working should not be executed.
And it is not a good way for fencing test.

Suggestions for improvement: 
We should add following sentence into the guide.
1. Network restart will trigger fencing the node which restarts the network even though the timeout is not exceeded.
2. Blocking the incoming/outgoing packet is one of the proper ways to test fencing.

Additional information:

Comment 9 Steven J. Levine 2017-07-12 16:59:56 UTC
The original issue -- noting that network restart causes fencing -- has been addressed but I'm moving this to 7.5 and changing the title to note that the focus of this BZ is now better fence test documentation.

Comment 12 Steven J. Levine 2017-08-01 14:35:42 UTC
I will modify the note about network restart as part of the general update to this BZ -- coordinating the fence test documentation.  This is now noted as 7.5, but it can be updated on the Portal whenever we complete it.

Comment 16 Steven J. Levine 2017-08-15 15:06:46 UTC
The current note about network restart says this:


NOTE
Once fencing is configured and a cluster has been started, a network restart will trigger fencing for the node which restarts the network even when the timeout is not exceeded. For this reason, testing your fence device by disabling the network interface will not properly test fencing. For information on testing a fence device, see Fencing in a Red Hat High Availability Cluster. and How to test fence devices and fencing configuration in a RHEL 5, 6, or 7 High Availability cluster?. 

Would this work as a rewrite, with two bullets phrased as instructions for a user?


NOTE

Once fencing is configured and a cluster has been started, a network restart will trigger fencing for the node which restarts the network even when the timeout is not exceeded. For this reason, you should keep the following in mind:

* Do not restart the network service while the cluster service is running because it will trigger an unintentional fencing on the node.

* Do not test your fence device by disabling the network interface,as this will not properly test fencing. For information on testing a fence device, see Fencing in a Red Hat High Availability Cluster. and How to test fence devices and fencing configuration in a RHEL 5, 6, or 7 High Availability cluster?.

Comment 18 Steven J. Levine 2017-08-16 16:55:37 UTC
The updated note is on the Portal here:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/s1-fenceconfig-HAAA.html

Not closing this yet, though, because we still might move the testing info from the Portal to this document, although we currently do point to it and that might remain the best place for it.

Comment 27 Steven J. Levine 2017-10-23 19:20:51 UTC
New section on testing a fence device is on the Portal:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/high_availability_add-on_reference/#s1-stonithtest-HAAR