Bug 1328448 - RFE: start-failure-is-fatal as per-resource parameter instead of global property
Summary: RFE: start-failure-is-fatal as per-resource parameter instead of global property
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pacemaker
Version: 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: pre-dev-freeze
: ---
Assignee: Ken Gaillot
QA Contact: cluster-qe
Steven J. Levine
URL:
Whiteboard:
Depends On:
Blocks: 1679810 1747563
TreeView+ depends on / blocked
 
Reported: 2016-04-19 12:04 UTC by Josef Zimek
Modified: 2023-08-10 15:40 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
: 1747563 (view as bug list)
Environment:
Last Closed:
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2894001 0 None None None 2018-07-30 14:20:59 UTC

Description Josef Zimek 2016-04-19 12:04:49 UTC
Description of problem:


Currently the "start-failure-is-fatal" is cluster global property so it immediately affects all resources. Some customers would like to have possibility to set this property in per-resource fashion to achieve better granularity of resource behaviour.

Expected result:

Some resources could have the "start-failure-is-fatal" parameter enabled while others disabled.

Comment 2 Andrew Beekhof 2016-04-20 00:06:51 UTC
Seems reasonable, it might talk some time to bubble up the priority list though

Comment 3 Ken Gaillot 2016-05-16 16:48:09 UTC
This will be evaluated in the 7.4 timeframe.

Comment 6 Ken Gaillot 2017-03-15 17:09:37 UTC
This feature will be included with the failure handling overhaul for Bug 1371576.

While a serious effort was made at implementing this, and a substantial amount of prerequisite work has been integrated upstream, the user-visible portion will not be ready in the 7.4 timeframe, so I am pushing this back to 7.5.

Comment 8 Ken Gaillot 2017-10-09 17:11:17 UTC
Due to time constraints, this will not make 7.5

Comment 9 Ken Gaillot 2018-11-19 19:18:24 UTC
Because this will require new configuration syntax, for technical reasons this will only be addressed in RHEL 8

Comment 11 Ken Gaillot 2019-08-30 20:11:32 UTC
The current plan is to implement 2 new operation meta-attributes, failure-restart and failure-escalation, to replace start-failure-is-fatal, migration-threshold, and on-fail (which would still be supported for backward compatibility).

The first failure-restart=<N> failures would result in restart attempts, and if all failed, the response in failure-escalation would be taken (equivalent to the current on-fail values, except "restart", and adding "ban" to force the resource off its current node).

Thus a start action with failure-restart set to 0 would be equivalent to start-failure-is-fatal="true", and a start with action with failure-restart set to a positive number would be equivalent to start-failure-is-fatal="false" with migration_threshold set to that number.


Note You need to log in before you can comment on or make changes to this bug.