Bug 2062359 - [RFE] Additional configurable failure recovery options for pacemaker managed resources
Summary: [RFE] Additional configurable failure recovery options for pacemaker managed ...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: pacemaker
Version: 9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Ken Gaillot
QA Contact: cluster-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-09 15:56 UTC by Shane Bradley
Modified: 2023-08-10 15:41 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-115046 0 None None None 2022-03-09 16:01:25 UTC
Red Hat Knowledge Base (Solution) 6804701 0 None None None 2022-03-09 16:07:15 UTC

Description Shane Bradley 2022-03-09 15:56:54 UTC
Description of problem:
Requesting additional configurable failure recovery options for pacemaker managed resources.

For example a customer requested:
  "RFE to add something like a retry and/or retry_attempts option for pacemaker 
   resource monitor operations."


Version-Release number of selected component (if applicable):
Latest 8.5 pacemaker

How reproducible:
Does not apply

Steps to Reproduce:
Does not apply

Actual results:
Currently a monitor failure of a resource results in pacemaker performing the "on-fail" value (restart, ignore, fence, etc). 

Expected results:
Provide more options to pacemaker to handle monitor resource failures such as "retry X times before considering the resource monitor a failure". 

Additional info:

We spoke with engineering about this issue and they state there are some other bugzilla that are related to this RFE:

  - 1747559 – Allow operation failure timeouts to be configured per operation in Pacemaker 
    https://bugzilla.redhat.com/show_bug.cgi?id=1747559

  - 1328448 – RFE: start-failure-is-fatal as per-resource parameter instead of global property 
    https://bugzilla.redhat.com/show_bug.cgi?id=1328448


Note You need to log in before you can comment on or make changes to this bug.