| Summary: | crm_resource reports incorrect data about resource location | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jaroslav Kortus <jkortus> | ||||
| Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 6.3 | CC: | cluster-maint, dvossel | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pacemaker-1.1.7-3.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Cause: The logic for determining whether a resource was active was faulty
Consequence: Resources that were active but on an unclean node were ignored by tools that relied on this logic
Fix: Fix the check so that all tools agree.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 816881 (view as bug list) | Environment: | |||||
| Last Closed: | 2012-06-20 13:48:50 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 816881 | ||||||
| Attachments: |
|
||||||
Can you include the output of cibadmin -Q when the cluster is in this state? Created attachment 567063 [details]
cibadmin -Q output
cibadmin -Q attached.
Strangely crm_simulate shows the same as crm_resource:
# CIB_file=~/Downloads/cibadmin.xml tools/crm_simulate -L
Current cluster status:
Node node03: UNCLEAN (offline)
Online: [ node01 ]
OFFLINE: [ node02 ]
virt-fencing (stonith:fence_xvm): Started node01
Resource Group: group01
webserver (ocf::heartbeat:apache): Started node03
ClusterIP (ocf::heartbeat:IPaddr2): Started node03
Hmm, not so strange, it appears to be telling the truth and it is crm_mon that lies:
The last op for webserver on node03 was a successful start action:
<lrm_rsc_op id="webserver_last_0" operation_key="webserver_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="13:441:0:22406839-5bf5-4990-9755-00c39457b51e" transition-magic="0:0;13:441:0:22406839-5bf5-4990-9755-00c39457b51e" call-id="5" rc-code="0" op-status="0" interval="0" last-run="1330694649" last-rc-change="1330694649" exec-time="90" queue-time="0" op-digest="88eb8382443cc988d0e6ddee48ebac1a"/>
A related patch has been committed upstream: https://github.com/beekhof/pacemaker/commit/31f6ca3 with subject: Medium: PE: Bug rhbz#799070 - Report resources as active in crm_mon if they are located on an unclean node Further details (if any): Now it behaves consistently ]$ crm_mon -1 ============ Last updated: Thu Apr 5 10:22:13 2012 Last change: Thu Apr 5 10:06:46 2012 via crmd on m3c1-node01 Stack: cman Current DC: m3c1-node01 - partition WITHOUT quorum Version: 1.1.7-5.el6-148fccfd5985c5590cc601123c6c16e966b85d14 3 Nodes configured, unknown expected votes 2 Resources configured. ============ Node m3c1-node03: UNCLEAN (offline) Node m3c1-node02: UNCLEAN (offline) Online: [ m3c1-node01 ] virt-fencing (stonith:fence_xvm): Started m3c1-node01 webserver (ocf::heartbeat:apache): Started m3c1-node02 However, it is still a difference compared to cman. When rgmanager looses quorum no services are reported as running (they are not reported at all). I would like to see something similar here as well. Pacemaker should report the service as running only when it has quorum and the service is working properly. In all other states it should not state Started, but some different state (failed, pending, unclean...). Would this be possible? pacemaker-1.1.7-5.el6.x86_64 (In reply to comment #11) > However, it is still a difference compared to cman. When rgmanager looses > quorum no services are reported as running (they are not reported at all). Pacemaker is not rgmanager. Pacemaker reports resources running if they are running, not based on whether quorum is true or false. Resources may or may not be allowed to run when quorum is lost, thats up to the admin.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause: The logic for determining whether a resource was active was faulty
Consequence: Resources that were active but on an unclean node were ignored by tools that relied on this logic
Fix: Fix the check so that all tools agree.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0846.html |
Description of problem: crm_resource -W -r resource reports last known node even if the partition is without quorum. The other tools report it differently. Version-Release number of selected component (if applicable): pacemaker-1.1.7-1.el6.x86_64 How reproducible: always Steps to Reproduce: 1. setup resource in pacemaker 2. fail enough nodes to loose quorum 3. run crm_resource -W -r resource Actual results: last node on which it was running is reported Expected results: service reported as failed, stopped or not running anywhere Additional info: [root@node01:/]$ crm_mon -1 ============ Last updated: Thu Mar 1 11:25:38 2012 Last change: Thu Mar 1 11:15:32 2012 via crm_shadow on node01 Stack: cman Current DC: node01 - partition WITHOUT quorum Version: 1.1.7-1.el6-148fccfd5985c5590cc601123c6c16e966b85d14 3 Nodes configured, unknown expected votes 3 Resources configured. ============ Node node03: UNCLEAN (offline) Online: [ node01 ] OFFLINE: [ node02 ] virt-fencing (stonith:fence_xvm): Started node01 [root@node01:/]$ crm_resource -W -r webserver resource webserver is running on: node03 [root@node01:/]$ crm configure show node node01 node node02 node node03 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.100.11" cidr_netmask="32" \ op monitor interval="30s" primitive virt-fencing stonith:fence_xvm \ params pcmk_host_check="static-list" pcmk_host_list="node01,node02,node03" action="reboot" debug="1" primitive webserver ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="30s" group group01 webserver ClusterIP property $id="cib-bootstrap-options" \ dc-version="1.1.7-1.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \ cluster-infrastructure="cman"