Bug 1328448

Summary: RFE: start-failure-is-fatal as per-resource parameter instead of global property
Product: Red Hat Enterprise Linux 8 Reporter: Josef Zimek <pzimek>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: NEW --- QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact: Steven J. Levine <slevine>
Priority: medium    
Version: 8.0CC: aglotov, cfeist, cluster-maint, fadamo, kgaillot, mnovacek, sbradley, slevine
Target Milestone: pre-dev-freezeKeywords: FutureFeature, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1747563 (view as bug list) Environment:
Last Closed: Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1679810, 1747563    

Description Josef Zimek 2016-04-19 12:04:49 UTC
Description of problem:


Currently the "start-failure-is-fatal" is cluster global property so it immediately affects all resources. Some customers would like to have possibility to set this property in per-resource fashion to achieve better granularity of resource behaviour.

Expected result:

Some resources could have the "start-failure-is-fatal" parameter enabled while others disabled.

Comment 2 Andrew Beekhof 2016-04-20 00:06:51 UTC
Seems reasonable, it might talk some time to bubble up the priority list though

Comment 3 Ken Gaillot 2016-05-16 16:48:09 UTC
This will be evaluated in the 7.4 timeframe.

Comment 6 Ken Gaillot 2017-03-15 17:09:37 UTC
This feature will be included with the failure handling overhaul for Bug 1371576.

While a serious effort was made at implementing this, and a substantial amount of prerequisite work has been integrated upstream, the user-visible portion will not be ready in the 7.4 timeframe, so I am pushing this back to 7.5.

Comment 8 Ken Gaillot 2017-10-09 17:11:17 UTC
Due to time constraints, this will not make 7.5

Comment 9 Ken Gaillot 2018-11-19 19:18:24 UTC
Because this will require new configuration syntax, for technical reasons this will only be addressed in RHEL 8

Comment 11 Ken Gaillot 2019-08-30 20:11:32 UTC
The current plan is to implement 2 new operation meta-attributes, failure-restart and failure-escalation, to replace start-failure-is-fatal, migration-threshold, and on-fail (which would still be supported for backward compatibility).

The first failure-restart=<N> failures would result in restart attempts, and if all failed, the response in failure-escalation would be taken (equivalent to the current on-fail values, except "restart", and adding "ban" to force the resource off its current node).

Thus a start action with failure-restart set to 0 would be equivalent to start-failure-is-fatal="true", and a start with action with failure-restart set to a positive number would be equivalent to start-failure-is-fatal="false" with migration_threshold set to that number.