Bug 1474905

Summary: stonith: dynamic enabling/disabling of stonith resources by rule-constraints
Product: Red Hat Enterprise Linux 8 Reporter: John Ruemker <jruemker>
Component: pacemakerAssignee: Klaus Wenninger <kwenning>
Status: CLOSED WONTFIX QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0CC: cluster-maint, kgaillot, mnovacek
Target Milestone: pre-dev-freezeKeywords: FutureFeature
Target Release: ---Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-01 07:29:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Ruemker 2017-07-25 15:28:59 UTC
Description of problem: This is spawning from discussions in Bug #1444020 around a specific customer's goals to add more intelligence around fencing decisions.  In those discussions it was agreed that what I'm about to describe might be more generally useful to others.   

This request is to have stonith devices be manageable via rule-based constraints, allowing them to follow node attributes (or anything else rule expressions can do).  The expectation would be for the result to be similar to stonith devices' following of location constraints - in that if you negative-constrain a stonith device against a node, that node should be unable to use that stonith device to fence another.  With a rule-based constraint, you could have decisions to enable or disable a device from a particular node based on node attributes that may be populated by a resource (like ping, for example).

This would be useful for influencing the stonith behavior based on conditions throughout the cluster.  With a ping resource populating heuristics, you could determine the winner of a membership split through activation of the stonith device on a node that has connectivity to clients.  Other decisions could be made through custom agents or through other node attributes.  


Version-Release number of selected component (if applicable): pacemaker in RHEL 7

Comment 1 John Ruemker 2017-07-25 15:36:25 UTC
Copying relevant part from Klaus of Comment #40 from 1444020 discussing this:

~~~
location-rules for fencing-resources may not contain anything dynamic like attributes as they are just evaluated once upon startup and when they are altered but not if e.g. an attribute changes.

This being unchanged requires location-rules (or pcs stonith enable/disable) to be altered dynamically based on heuristics results by a script.
This could be a resource with location-constraints using the pingd-attribute.
RA would then enable/disable the fencing resource.
This would just work of course with a small totem-timeout as starting/stopping the resource wouldn't be done once the DC-election is over (at least on the side that isn't DC prior to split).
If hacked into /usr/lib/ocf/resource.d/pacemaker/ping (additional attribute giving a fencing-resource to enable/disable based on a threshold of the node-attribute otherwise updated) the high totem-timeout shouldn't be that crucial as the monitor should be driven autonomously by lrmd.

Of course a cleaner alternative that doesn't require an additional script/resource would be to somehow enable dynamic location-constraints for fencing-resources.
This could either be full support for rules as with other resources than fencing. (Quite complex either pulling pengine into stonithd or having additional syncing between stonithd & pengine - with all drawbacks of lags, non-local communication, ...)
Or some light-weight implementation of a subset like e.g. the possibility to list a node-attribute in the in the fencing-resource-config stonithd will then monitor. Don't know which implications this finally has implementation-wise but at least no communication between nodes is needed and no additional pengine-runs inside stonithd.
~~~

Comment 3 Ken Gaillot 2017-08-01 16:33:24 UTC
Due to capacity constraints, this is unlikely to be addressed in the 7.5 timeframe. The approach in Bug 1476401 is more likely

Comment 8 RHEL Program Management 2020-12-01 07:29:19 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 9 Red Hat Bugzilla 2023-09-14 04:01:36 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days