Bug 1721603

Summary:	RFE: new pacemaker constraint for fencing actions that depend on regular resources being active
Product:	Red Hat Enterprise Linux 8	Reporter:	Ken Gaillot <kgaillot>
Component:	pacemaker	Assignee:	Ken Gaillot <kgaillot>
Status:	CLOSED WONTFIX	QA Contact:	cluster-qe <cluster-qe>
Severity:	low	Docs Contact:
Priority:	low
Version:	8.0	CC:	cluster-maint, marjones
Target Milestone:	pre-dev-freeze	Keywords:	FutureFeature
Target Release:	---
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-02-01 07:41:34 UTC	Type:	Feature Request
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ken Gaillot 2019-06-18 16:52:23 UTC

Description of problem: It is possible for a particular fence device to be usable for fencing only if a normal (non-fencing) cluster resource is active. (This is acceptable only if the fence device is configured such that it cannot target any nodes allowed to run the normal resource.)

OpenStack is an example where this can occur: if the controller nodes are pacemaker cluster nodes, and the compute nodes are pacemaker remote nodes, then unfencing the remote nodes with fence_compute requires access to the keystone authentication IP running as a pacemaker resource on a controller node.

Currently, this can result in a deadlock if both a dependent target node and the node running the normal resource must be fenced in the same cluster transition, the dependent resource is not functional, and the cluster serializes the normal resource's node fencing after the dependent target fencing (which can happen if the concurrent-fencing cluster property is false, or the resource's node is the DC, which implies it is scheduling itself for fencing for some reason other than complete node loss, such as a failed resource stop).

The proposed solution is a new constraint syntax that would specify the fencing device and the normal resource it depends on.

The most straightforward syntax would be to use pacemaker's current "rsc_order" constraint, with "first" set to the normal resource, "then" set to the fencing resource, and "then-action" set to the new value "fence". (Alternatively "then-action" could be set to "on", "off", or "reboot", but it seems more likely all fence actions would have the same requirement.)

Steps to Reproduce:
1. Configure a cluster such that 2 nodes can be fenced at the same time (e.g. 5 cluster nodes, or 2 cluster nodes plus a remote node).

2. Configure real fencing for all nodes.

3. Configure a normal resource, constrained to a single node (-INFINITY constraints on all other nodes).

4. Configure a fence device that fails if the normal resource is not active. (Dummy agents can be modified for this purpose, or fence_compute can be configured with the keystone IP as the normal resource.) The fence device should be configured in a topology with the real fencing to target some particular node other than the one that runs the normal resource.

5. Set concurrent-fencing=false for ease of testing.

6. Cause the normal resource to be nonfunction, and both the node running the normal resource and the node targeted by the fence device to require fencing (e.g. kill power on both simultaneously).

Actual results: If the cluster serializes the normal resource's node fencing last, the cluster will get stuck in a loop with the dependent fencing action repeatedly failing, and be unable to recover.

Expected results: The cluster eventually recovers properly.

Comment 1 Ken Gaillot 2019-06-18 17:04:35 UTC

A possible implementation would be for the scheduler to add a list of fence devices that do not have all their constraints satisfied, to fence actions. The controller would pass this along to the fencer with the fence request, and the fencer would treat such devices as disabled. (This is likely a better approach than having the fencer load the CIB and run a simulation to determine this on its own, due to the performance overhead and potential issues in mixed-version clusters.)

Comment 4 RHEL Program Management 2021-02-01 07:41:34 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 5 Ken Gaillot 2021-02-01 14:55:39 UTC

This issue has been reported upstream for exposure to a wider audience.