Description of problem: Currently the "start-failure-is-fatal" is cluster global property so it immediately affects all resources. Some customers would like to have possibility to set this property in per-resource fashion to achieve better granularity of resource behaviour. Expected result: Some resources could have the "start-failure-is-fatal" parameter enabled while others disabled.
Seems reasonable, it might talk some time to bubble up the priority list though
This will be evaluated in the 7.4 timeframe.
This feature will be included with the failure handling overhaul for Bug 1371576. While a serious effort was made at implementing this, and a substantial amount of prerequisite work has been integrated upstream, the user-visible portion will not be ready in the 7.4 timeframe, so I am pushing this back to 7.5.
Due to time constraints, this will not make 7.5
Because this will require new configuration syntax, for technical reasons this will only be addressed in RHEL 8
The current plan is to implement 2 new operation meta-attributes, failure-restart and failure-escalation, to replace start-failure-is-fatal, migration-threshold, and on-fail (which would still be supported for backward compatibility). The first failure-restart=<N> failures would result in restart attempts, and if all failed, the response in failure-escalation would be taken (equivalent to the current on-fail values, except "restart", and adding "ban" to force the resource off its current node). Thus a start action with failure-restart set to 0 would be equivalent to start-failure-is-fatal="true", and a start with action with failure-restart set to a positive number would be equivalent to start-failure-is-fatal="false" with migration_threshold set to that number.