Bug 745526

Summary: pacemaker+cman fencing is unreliable
Product: Red Hat Enterprise Linux 6 Reporter: Jaroslav Kortus <jkortus>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.2CC: cluster-maint
Target Milestone: rcKeywords: TechPreview
Target Release: 6.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.6-3.el6 Doc Type: Technology Preview
Doc Text:
Prior to this update, an error in the interaction between Pacemaker and CMAN's fencing subsystem prevented reliable fencing operation. This update applies a patch that corrects this error so that such fencing operations are now reliable.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 16:50:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 748554    

Description Jaroslav Kortus 2011-10-12 15:25:48 UTC
Description of problem:
when pacemaker is configured together with cman the fencing does not work in a reliable way.

The manual says that pacemaker should take over the fencing responsibility and for that there is fence_pcmk fencedevice replacement (pacemaker fencing redirect).

The problem is that it also fakes the response too early in the process. This means that the fencing is acknowledged even before any attempt is made (!).

To illustrate the problem, create pacemaker+cman combo and mount gfs2 filesystem. Then pkill -9 corosync on one of the nodes and watch the recovery.

Relevant snips:
Oct 12 10:07:15 marathon-01 corosync[18840]:   [TOTEM ] A processor failed, forming new configuration.
Oct 12 10:07:27 marathon-01 corosync[18840]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 12 10:07:27 marathon-01 fenced[19023]: fencing node marathon-05
Oct 12 10:07:27 marathon-01 fence_pcmk: Requesting Pacemaker fence marathon-05 (reset)
Oct 12 10:07:27 marathon-01 fenced[19023]: fence marathon-05 success
Oct 12 10:07:28 marathon-01 stonith-ng: [19198]: info: make_args: reboot-ing node 'marathon-05' as 'port=5'
Oct 12 10:07:28 marathon-01 kernel: GFS2: fsid=marathon:vedder0.0: jid=4: Looking at journal...
Oct 12 10:07:28 marathon-01 kernel: GFS2: fsid=marathon:vedder0.0: jid=4: Acquiring the transaction lock...
Oct 12 10:07:28 marathon-01 kernel: GFS2: fsid=marathon:vedder0.0: jid=4: Replaying journal...
Oct 12 10:07:28 marathon-01 kernel: GFS2: fsid=marathon:vedder0.0: jid=4: Replayed 0 of 0 blocks
Oct 12 10:07:28 marathon-01 kernel: GFS2: fsid=marathon:vedder0.0: jid=4: Found 1 revoke tags
Oct 12 10:07:28 marathon-01 kernel: GFS2: fsid=marathon:vedder0.0: jid=4: Journal replayed in 1s
Oct 12 10:07:28 marathon-01 kernel: GFS2: fsid=marathon:vedder0.0: jid=4: Done
Oct 12 10:07:32 marathon-01 stonith-ng: [19198]: info: log_operation: Operation 'reboot' [19798] (call 0 from (null)) for host 'marathon-05' with device 'apc-fencing' returned: 0
Oct 12 10:07:32 marathon-01 stonith-ng: [19198]: info: log_operation: apc-fencing: Parse error: Ignoring unknown option 'nodename=marathon-05'
Oct 12 10:07:32 marathon-01 stonith-ng: [19198]: info: log_operation: apc-fencing: Success: Rebooted

It's clearly visible that the recovery took place way before the node could actually confirm that the failing node can't touch the device any more.

Version-Release number of selected component (if applicable):
cman-3.0.12.1-23.el6.x86_64
pacemaker-1.1.6-2.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. setup cman+pacemaker
2. create and mount gfs2 FS
3. pkill -9 one one of the nodes and see the recovery
  
Actual results:
- fencing is faked as successful via fence_pcmk
- recovery happens before the node is fenced
- fencing may fail, while the recovery would still be performed (journals replayed). This is VERY dangerous and should not happen.

Expected results:
one of:
- fencing is not faked and the reply is sent after pacemaker finishes fencing event
- fencing is disabled in pacemaker and cman handles it as it used to do (+doc fix to reflect this)

Additional info:
cluster.conf:
<?xml version="1.0"?>
<cluster name="marathon" config_version="1">
  <cman>


  </cman>
  <fence_daemon post_join_delay="20" clean_start="0"/>
  <clusternodes>
    <clusternode name="marathon-01" votes="1" nodeid="1">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="marathon-01"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="marathon-02" votes="1" nodeid="2">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="marathon-02"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="marathon-03" votes="1" nodeid="3">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="marathon-03"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="marathon-04" votes="1" nodeid="4">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="marathon-04"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="marathon-05" votes="1" nodeid="5">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="marathon-05"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice agent="fence_pcmk" name="pcmk"/>
  </fencedevices>
</cluster>

Comment 2 Andrew Beekhof 2011-10-17 21:54:58 UTC
Would you be able to run crm_report for the time period covered by the test?  (Remember to quote the date/time string).

I did check to see that fence_pcmk was running synchronously, apparently not hard enough. My apologies.

Comment 3 Andrew Beekhof 2011-10-19 00:24:03 UTC
A related patch has been committed upstream: https://github.com/ClusterLabs/pacemaker/commit/2d8fad5

Comment 4 Andrew Beekhof 2011-10-19 00:40:12 UTC
In fact it is worse, the additional logging I added after testing actually prevents the agent from passing the request on to pacemaker.

I have since tested the above patch and had it reviewed by Lon.

Without the patch, running:
  /usr/sbin/fence_pcmk -n east-01 < /dev/null
results in no additional logs from stonith-ng in /var/log/messages (because stonith_admin is not being invoked)

With the patch, at the very minimum, there should be a log similar to:

Oct 18 20:00:48 east-03 stonith-ng: [18764]: info: initiate_remote_stonith_op: Initiating remote operation off for east-01: c5111dd8-8a1c-4b6a-aaf0-5a793dc2ed79

Additionally, when trying to fence an unknown node the command can now be seen to (correctly) wait and receive the error:

[root@east-03 ~]# /usr/sbin/fence_pcmk -n unknown-node < /dev/null
Command failed: Operation timed out
failed: unknown-node 248

Comment 10 Jaromir Hradilek 2011-10-26 09:32:19 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Prior to this update, an error in the interaction between Pacemaker and CMAN's fencing subsystem prevented reliable fencing operation. This update applies a patch that corrects this error so that such fencing operations are now reliable.

Comment 13 errata-xmlrpc 2011-12-06 16:50:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1669.html