Bug 1474463

Summary: fencing-device not properly registered after disable/enable cycle
Product: Red Hat Enterprise Linux 7 Reporter: Klaus Wenninger <kwenning>
Component: pacemakerAssignee: Klaus Wenninger <kwenning>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4CC: abeekhof, aherr, cfeist, cluster-maint, jruemker, kgaillot, mnovacek, nbarcet, phagara, sfroemer
Target Milestone: rcKeywords: ZStream
Target Release: 7.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.18-1.el7 Doc Type: Bug Fix
Doc Text:
Previously, Pacemaker's stonithd service used an incorrect search pattern when checking the configuration for re-enabled fence devices. Consequently, re-enabled fence devices were shown to be available only on the node where they had been started. With this update, stonithd now uses the correct search pattern, and the described problem no longer occurs.
Story Points: ---
Clone Of:
: 1481142 (view as bug list) Environment:
Last Closed: 2018-04-10 15:30:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1481142    

Description Klaus Wenninger 2017-07-24 16:57:01 UTC
Description of problem:

fencing-devices are foreseen to be used even if not shown as started.
It is possible to disable them either via location-rules (no rules that dynamically change their result like score attributes or alike as they are not reevaluated by stonithd) or by explicitely setting the target-role to stopped.
After a diable/enable cycle stonith devices are just used if explicitly started on a node. 


Version-Release number of selected component (if applicable):
1.1.16-12.el7

How reproducible:
100%

Steps to Reproduce:
1. create a fencing-resource-primitive without any rules restricting it to a node
2. restart cluster
3. use crm_mon to assure that fencing-resource accounts as started on one of the cluster nodes
4. use 'stonith_admin -L' to verify that the fencing-resource is available both on the node where it is claimed to be started and on at least one other node
5. issue 'pcs stonith disable {your-fencing-resource}' and 'pcs stonith disable {your-fencing-resource}'

Actual results:
'stonith_admin -L' shows the fencing-resource just on the node where it is claimed to be started

Expected results:
'stonith_admin -L' should show the fencing-resource to be available on all nodes

Additional info:

Comment 2 Klaus Wenninger 2017-07-24 17:01:19 UTC
cib-diffs are not being checked properly in stonithd to decide if a full parsing should be triggered or not.

https://github.com/ClusterLabs/pacemaker/pull/1314/commits/5e3cd2614e1db60a14d5615c9c175575409b56d6

seems to solve the issue.

Comment 4 michal novacek 2017-08-04 09:56:14 UTC
qa-ack+: clear reproducer in initial commit

Comment 7 Patrik Hagara 2017-12-05 15:36:42 UTC
Reproduced as outlined in the first comment:
  * "pcs cluster setup --start --wait ..."
  * "pcs stonith create ..."
  * "pcs cluster stop --all"
  * "pcs cluster start --all --wait"
  * "crm_mon -X"; verify fence resource seen as started on 1 node from all nodes
  * "stonith_admin -L"; verify fence device registered/available on all nodes
  * "pcs stonith disable --wait ..."
  * "pcs stonith enable --wait ..."
  * "crm_mon -X"; verify fence resource seen as started on 1 node from all nodes
  * "stonith_admin -L"; verify fence device still registered/available on all nodes

Before the fix (1.1.16-12.el7) only the node on which the fence resource is in "Started" role sees the corresponding fence device as registered/available, other nodes do not (as reported by "stonith_admin -L"). After the fix (1.1.18-1.el7) all nodes see the fence device as registered/available. Marking verified.

Comment 10 errata-xmlrpc 2018-04-10 15:30:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0860