Bug 1747563

Summary:	[RFE] start-failure-is-fatal as per-resource parameter instead of global property
Product:	Red Hat Enterprise Linux 8	Reporter:	Ken Gaillot <kgaillot>
Component:	pcs	Assignee:	Tomas Jelinek <tojeline>
Status:	CLOSED WONTFIX	QA Contact:	cluster-qe <cluster-qe>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	8.0	CC:	cfeist, cluster-maint, cluster-qe, fadamo, idevat, mmazoure, mnovacek, nhostako, omular, pzimek, sbradley, tojeline
Target Milestone:	pre-dev-freeze	Keywords:	FutureFeature, Reopened, Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:	1328448	Environment:
Last Closed:	2021-09-04 07:26:58 UTC	Type:	Feature Request
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1328448
Bug Blocks:	1679810

Description Ken Gaillot 2019-08-30 20:13:15 UTC

+++ This bug was initially created as a clone of Bug #1328448 +++

Description of problem:


Currently the "start-failure-is-fatal" is cluster global property so it immediately affects all resources. Some customers would like to have possibility to set this property in per-resource fashion to achieve better granularity of resource behaviour.

Expected result:

Some resources could have the "start-failure-is-fatal" parameter enabled while others disabled.

--- Additional comment from Ken Gaillot on 2019-08-30 20:11:32 UTC ---

The current plan is to implement 2 new operation meta-attributes, failure-restart and failure-escalation, to replace start-failure-is-fatal, migration-threshold, and on-fail (which would still be supported for backward compatibility).

The first failure-restart=<N> failures would result in restart attempts, and if all failed, the response in failure-escalation would be taken (equivalent to the current on-fail values, except "restart", and adding "ban" to force the resource off its current node).

Thus a start action with failure-restart set to 0 would be equivalent to start-failure-is-fatal="true", and a start with action with failure-restart set to a positive number would be equivalent to start-failure-is-fatal="false" with migration_threshold set to that number.

pcs will need to accept the new options in pcs resource create and pcs op.

Comment 1 yuk 2020-04-17 14:43:49 UTC

What's the meaning of this BZ?
It seems the same as https://bugzilla.redhat.com/show_bug.cgi?id=1328448

Comment 2 Ken Gaillot 2020-04-17 15:07:19 UTC

(In reply to yuk from comment #1)
> What's the meaning of this BZ?
> It seems the same as https://bugzilla.redhat.com/show_bug.cgi?id=1328448

Bug 1328448 covers the pacemaker portion of the fix, and this bz covers the pcs interface.

Comment 8 RHEL Program Management 2021-03-02 07:31:00 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 13 Ken Gaillot 2021-06-09 14:51:45 UTC

The pacemaker bz this depends on has been lowered in priority and will not make 8.5 and is not targeting any release at this point

Comment 17 RHEL Program Management 2021-09-04 07:26:58 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.