Bug 1470262
Summary: | disabling a fencing-device that has queued actions leads to stonithd receiving SIGABRT | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Klaus Wenninger <kwenning> | |
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 7.4 | CC: | abeekhof, aherr, cfeist, cluster-maint, jruemker, mnovacek, nbarcet, phagara | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | 7.5 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | pacemaker-1.1.18-1.el7 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, if a fencing device configured with the pcmk_delay_max setting was disabled while a fencing action was being delayed, Pacemaker's stonithd service attempted to free memory used for the action twice. As a consequence, Pacemaker terminated unexpectedly. With this update, stonithd has been fixed to free the memory only once, and as a result, the described problem no longer occurs.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1481141 (view as bug list) | Environment: | ||
Last Closed: | 2018-04-10 15:30:29 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1481141 |
Description
Klaus Wenninger
2017-07-12 15:43:59 UTC
Tested as per comment 1: * create stonith resource with pcmk_delay_max set to 300 seconds * start a fence operation ("pcs stonith fence ...") * while the fence operation is being delayed, run "pcs stonith disable ..." Before the fix (1.1.16-12.el7): * the fence operation returns non-zero after almost double the pcmk_delay_max (eg. 564 seconds) with the message: > Error: unable to fence 'virt-156' > Command failed: Timer expired * the "pcs stonith disable ..." command hangs (likely) indefinitely * stonithd received SIGABRT on the node performing the fence operation * no node got fenced * node on which stonithd crashed transitions into "UNCLEAN (Online)" cluster membership status * fence resource marked as "FAILED (disabled)" After the fix (1.1.18-1.el7): * the fence operation returns non-zero immediately with: > Error: unable to fence 'virt-164' > Command failed: No route to host * the "pcs stonith disable ..." command completes successfully * stonithd does not crash and no other crashes are detected by abrt * no node got fenced * all nodes are in a clean online cluster membership state * fence resource marked as "Stopped (disabled)" Marking verified. Just to clarify the first point in "after the fix" should read: * the fence operation returns non-zero immediately __after "pcs stonith disable ..." completes__ with: Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0860 |