Bug 846983
| Summary: | Incorrect cancelling of monitoring operation on M/S resource when record-pending="true" | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Ales Nosek <ales.nosek> |
| Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> |
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.3 | CC: | cluster-maint, jkortus |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | http://bugs.clusterlabs.org/show_bug.cgi?id=5109 | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-1.1.8-4.el6 | Doc Type: | Bug Fix |
| Doc Text: |
Cause:
"Cancel" operations were incorrectly stored in the cluster status.
Consequence:
The cluster detected the stale operations and tried to clean the up, causing additional logging and confusing GUIs.
Fix:
Exclude them from being stored in the cluster status.
Result:
There are no actions matching *_cancel_* stored in the cluster status.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-02-21 09:51:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
We'd need a crm_report for the time period covered by the testcase in order to be able to comment. Resolved upstream with patch: https://github.com/beekhof/pacemaker/commit/197275f Will be included in 6.4 I have not observed the described behaviour with pacemaker-1.1.8-7.el6.x86_64. cibadmin -Q | grep cancel shows no matches. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0375.html |
Version-Release number of selected component (if applicable): Pacemaker 1.1.7-6 from the RHEL 6.3 High-Availability Add-on Steps to Reproduce: 1. Configure a M/S resource and set record-pending="true" 2. Put the node where the Master is running in standby mode 3. Cluster will move the Master to the second node 4. You'll see errors in the log file as cluster was trying to cancel the monitoring operating on the Slave resource $ crm configure show node dem_vx01_cas27a \ attributes standby="off" node dem_vx01_cas27b \ attributes standby="on" primitive disk0 ocf:linbit:drbd \ params drbd_resource="disk0" \ operations $id="disk0-operations" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ op monitor interval="10" role="Master" timeout="20" \ op monitor interval="20" role="Slave" timeout="20" primitive stonith_dem_vx01_cas27a stonith:fence_ipmilan \ operations $id="stonith_dem_vx01_cas27a-operations" \ op monitor interval="15" timeout="15" start-delay="15" \ op monitor interval="0" timeout="15" \ params ipaddr="dem_vx01_cas27a_rsc" login="stonith" passwd="x" lanplus="true" pcmk_host_list="dem_vx01_cas27a" primitive stonith_dem_vx01_cas27b stonith:fence_ipmilan \ operations $id="stonith_dem_vx01_cas27b-operations" \ op monitor interval="15" timeout="15" start-delay="15" \ op monitor interval="0" timeout="15" \ params ipaddr="dem_vx01_cas27b_rsc" login="stonith" passwd="x" lanplus="true" pcmk_host_list="dem_vx01_cas27b" \ meta target-role="Started" ms ms_disk0 disk0 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" location stonith_dem_vx01_cas27a_loc stonith_dem_vx01_cas27a -inf: dem_vx01_cas27a location stonith_dem_vx01_cas27b_loc stonith_dem_vx01_cas27b -inf: dem_vx01_cas27b property $id="cib-bootstrap-options" \ dc-version="1.1.7-6-148fccfd5985c5590cc601123c6c16e966b85d14" \ cluster-infrastructure="openais" \ last-lrm-refresh="1344503667" \ no-quorum-policy="ignore" \ expected-quorum-votes="2" \ startup-fencing="false" rsc_defaults $id="rsc_defaults-options" \ resource-stickiness="100" \ failure-timeout="60" op_defaults $id="op_defaults-options" \ record-pending="true" $ crm_mon -rf1 ============ Last updated: Thu Aug 9 11:16:07 2012 Last change: Thu Aug 9 11:14:27 2012 Stack: openais Current DC: dem_vx01_cas27a - partition with quorum Version: 1.1.7-6-148fccfd5985c5590cc601123c6c16e966b85d14 2 Nodes configured, 2 expected votes 4 Resources configured. ============ Online: [ dem_vx01_cas27a dem_vx01_cas27b ] Full list of resources: Master/Slave Set: ms_disk0 [disk0] Masters: [ dem_vx01_cas27b ] Slaves: [ dem_vx01_cas27a ] stonith_dem_vx01_cas27a (stonith:fence_ipmilan): Started dem_vx01_cas27b stonith_dem_vx01_cas27b (stonith:fence_ipmilan): Started dem_vx01_cas27a Migration summary: * Node dem_vx01_cas27b: * Node dem_vx01_cas27a: $ crm node standby dem_vx01_cas27b $ tail -f /var/log/messages Aug 9 11:17:56 dem_vx01_cas27a crmd[17523]: info: te_rsc_command: Initiating action 12: monitor disk0:0_monitor_10000 on dem_vx01_cas27a (local) Aug 9 11:17:56 dem_vx01_cas27a lrmd: [17520]: info: rsc:disk0:0:74: monitor Aug 9 11:17:56 dem_vx01_cas27a crmd[17523]: info: te_rsc_command: Recording pending op disk0:0_monitor_10000 in the CIB Aug 9 11:17:56 dem_vx01_cas27a crmd[17523]: info: create_operation_update: cib_action_update: Updating resouce disk0:0 after pending monitor op (interval=10000) Aug 9 11:17:56 dem_vx01_cas27a crmd[17523]: info: te_rsc_command: Initiating action 4: cancel disk0:0_cancel_20000 on dem_vx01_cas27a (local) Aug 9 11:17:56 dem_vx01_cas27a crmd[17523]: info: cancel_op: No pending op found for disk0:0:-1 Aug 9 11:17:56 dem_vx01_cas27a lrmd: [17520]: info: on_msg_cancel_op: no operation with id -1 Aug 9 11:17:56 dem_vx01_cas27a crmd[17523]: info: te_rsc_command: Recording pending op disk0:0_cancel_20000 in the CIB Aug 9 11:17:56 dem_vx01_cas27a cib[17518]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='dem_vx01_cas27a']//lrm_resource[@id='disk0:0']/lrm_rsc_op[@id='disk0:0_cancel_20000'] (origin=local/crmd/217, version=2.1428.6): ok (rc=0) Aug 9 11:17:56 dem_vx01_cas27a crmd[17523]: info: create_operation_update: cib_action_update: Updating resouce disk0:0 after pending cancel op (interval=20000) Actual results: Following snippet remains in the cib.xml: $ cibadmin -Q <lrm_rsc_op id="disk0:0_cancel_20000" operation_key="disk0:0_cancel_20000" operation="cancel" crm-debug-origin="cib_action_update" crm_feature_set="3.0.6" transition-key="4:28:0:3c9a3839-a2cc-4a52-95c9-07da2f02f1e5" transition-magic="-1:14;4:28:0:3c9a3839-a2cc-4a52-95c9-07da2f02f1e5" call-id="-1" rc-code="14" op-status="-1" interval="20000" op-digest="10c845ea895312d505c0de998afa00f6"/> </lrm_resource> Additional info: When setting record-pending=false all works perfectly and there's no operation operation disk0:0_cancel_20000 in the transition graph at all.