Bug 1481141

Summary: disabling a fencing-device that has queued actions leads to stonithd receiving SIGABRT [rhel-7.4.z]
Product: Red Hat Enterprise Linux 7 Reporter: Oneata Mircea Teodor <toneata>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4CC: abeekhof, aherr, cfeist, cluster-maint, jruemker, kgaillot, kwenning, mjuricek, mnovacek, nbarcet
Target Milestone: rcKeywords: ZStream
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.16-12.el7_4.1 Doc Type: Bug Fix
Doc Text:
Previously, if a fencing device configured with the pcmk_delay_max setting was disabled while a fencing action was being delayed, Pacemaker's stonithd service attempted to free memory used for the action twice. As a consequence, Pacemaker terminated unexpectedly. With this update, stonithd has been fixed to free the memory only once, and as a result, the described problem no longer occurs.
Story Points: ---
Clone Of: 1470262 Environment:
Last Closed: 2017-09-05 11:31:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1470262    
Bug Blocks:    

Description Oneata Mircea Teodor 2017-08-14 08:26:06 UTC
This bug has been copied from bug #1470262 and has been proposed to be backported to 7.4 z-stream (EUS).

Comment 2 Ken Gaillot 2017-08-15 15:50:43 UTC
Testing procedure:

1. Configure a cluster of at least two nodes and a fencing device with a long pcmk_delay_max setting.

2. Initiate fencing of one node via pcs, then immediately disable the fence device.

Before the fix, stonithd will core dump, and pcs will time out. After the fix, stonithd will not core dump, and pcs will immediately return an error (because the device is no longer usable).

Because pcmk_delay_max uses a random delay, it may not be obvious whether the fence device was disabled before the delay expired. You can check the logs to see what order things happened in.

Comment 6 errata-xmlrpc 2017-09-05 11:31:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2587