Bug 745526
Summary: | pacemaker+cman fencing is unreliable | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jaroslav Kortus <jkortus> |
Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 6.2 | CC: | cluster-maint |
Target Milestone: | rc | Keywords: | TechPreview |
Target Release: | 6.2 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | pacemaker-1.1.6-3.el6 | Doc Type: | Technology Preview |
Doc Text: |
Prior to this update, an error in the interaction between Pacemaker and CMAN's fencing subsystem prevented reliable fencing operation. This update applies a patch that corrects this error so that such fencing operations are now reliable.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2011-12-06 16:50:49 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 748554 |
Description
Jaroslav Kortus
2011-10-12 15:25:48 UTC
Would you be able to run crm_report for the time period covered by the test? (Remember to quote the date/time string). I did check to see that fence_pcmk was running synchronously, apparently not hard enough. My apologies. A related patch has been committed upstream: https://github.com/ClusterLabs/pacemaker/commit/2d8fad5 In fact it is worse, the additional logging I added after testing actually prevents the agent from passing the request on to pacemaker. I have since tested the above patch and had it reviewed by Lon. Without the patch, running: /usr/sbin/fence_pcmk -n east-01 < /dev/null results in no additional logs from stonith-ng in /var/log/messages (because stonith_admin is not being invoked) With the patch, at the very minimum, there should be a log similar to: Oct 18 20:00:48 east-03 stonith-ng: [18764]: info: initiate_remote_stonith_op: Initiating remote operation off for east-01: c5111dd8-8a1c-4b6a-aaf0-5a793dc2ed79 Additionally, when trying to fence an unknown node the command can now be seen to (correctly) wait and receive the error: [root@east-03 ~]# /usr/sbin/fence_pcmk -n unknown-node < /dev/null Command failed: Operation timed out failed: unknown-node 248 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Prior to this update, an error in the interaction between Pacemaker and CMAN's fencing subsystem prevented reliable fencing operation. This update applies a patch that corrects this error so that such fencing operations are now reliable. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1669.html |