Bug 1394418

Summary: fence_compute: nova compute service is not unfenced
Product: Red Hat Enterprise Linux 7 Reporter: Marian Krcmarik <mkrcmari>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: Ofer Blaut <oblaut>
Severity: urgent Docs Contact: Steven J. Levine <slevine>
Priority: urgent    
Version: 7.3CC: abeekhof, aherr, cfeist, cluster-maint, igkioka, kgaillot, mnovacek
Target Milestone: rcKeywords: ZStream
Target Release: 7.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.18-1.el7 Doc Type: Release Note
Doc Text:
Pacemaker correctly implements fencing and unfencing for Pacemaker remote nodes Previously, Pacemaker did not implement unfencing for Pacemaker remote nodes. As a consequence, Pacemaker remote nodes remained fenced even if a fence device required unfencing. With this update, Pacemaker correctly implements both fencing and unfencing for Pacemaker remote nodes, and the described problem no longer occurs.
Story Points: ---
Clone Of:
: 1491544 (view as bug list) Environment:
Last Closed: 2018-04-10 15:28:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1491544    

Description Marian Krcmarik 2016-11-11 22:29:36 UTC
Description of problem:
Based on discussion with Andrew I am creating a bug report which is related to fence_compute stonith device used within Instance HA of Openstack - nova compute service is not unfenced and remains in forced down status even though compute node and nova compute service on the node are up and running.

The definition of the stonith device:
 Resource: fence-nova (class=stonith type=fence_compute)
  Attributes: auth-url=http://10.0.0.103:5000/v2.0 login=admin passwd=qNJdaqZ7F4CHZVD37EGztCksd tenant-name=admin domain=localdomain record-only=1 no-shared-storage=False action=off
  Meta Attrs: provides=unfencing 
  Operations: monitor interval=60s (fence-nova-monitor-interval-60s)
 Node: compute-0
  Level 1 - my-stonith-xvm-compute-0,fence-nova
 Node: compute-1
  Level 1 - my-stonith-xvm-compute-1,fence-nova

Once a node is fenced nova-compute service is marked as down in service list of openstack services, but It's not unmarked as down when compute is back online and operational and then the compute cannot be used for scheduling insance unless it's manually unmarked as down

Version-Release number of selected component (if applicable):
$ rpm -qa | grep pacemaker
pacemaker-1.1.15-11.el7_3.2.x86_64
pacemaker-remote-1.1.15-11.el7_3.2.x86_64
fence-agents-compute-4.0.11-47.el7_3.1.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Configure fence_compute stonith device on Openstack cluster
2. Reset compute node
3. Wait for the compute node to be online

Actual results:
nova-compute service marked as down in "nova service-list"

Expected results:
nova-compute service marked as up in "nova service-list"

Additional info:

Comment 1 Andrew Beekhof 2016-11-14 03:41:29 UTC
Have you got some sosreports to go with this?
My installation is still misbehaving

Comment 3 Ken Gaillot 2017-01-16 23:46:50 UTC
This may already be fixed in the relevant agent, but leaving open until confirmed. Not needed for 7.4

Comment 6 Ken Gaillot 2017-09-13 16:28:03 UTC
Unfencing of Pacemaker Remote nodes is fixed in current upstream master branch

Comment 8 Ken Gaillot 2017-10-10 17:06:48 UTC
QA: Test procedure:

1. Configure a cluster with at least one cluster node and one Pacemaker Remote node, using fence_scsi as the fencing device.

2. Cause the Pacemaker Remote node to be fenced.

3. Before the fix, the remote node will not be unfenced; after the fix, it will.

Comment 10 Steven J. Levine 2017-12-07 18:20:20 UTC
Adding a title to doc text description for Release Note format

Comment 13 errata-xmlrpc 2018-04-10 15:28:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0860