Bug 799070 - crm_resource reports incorrect data about resource location
Summary: crm_resource reports incorrect data about resource location
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: pacemaker
Version: 6.3
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: ---
Assignee: Andrew Beekhof
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 816881
TreeView+ depends on / blocked
 
Reported: 2012-03-01 17:28 UTC by Jaroslav Kortus
Modified: 2016-04-26 14:11 UTC (History)
2 users (show)

Fixed In Version: pacemaker-1.1.7-3.el6
Doc Type: Bug Fix
Doc Text:
Cause: The logic for determining whether a resource was active was faulty Consequence: Resources that were active but on an unclean node were ignored by tools that relied on this logic Fix: Fix the check so that all tools agree.
Clone Of:
: 816881 (view as bug list)
Environment:
Last Closed: 2012-06-20 13:48:50 UTC
Target Upstream Version:


Attachments (Terms of Use)
cibadmin -Q output (8.66 KB, text/xml)
2012-03-02 13:28 UTC, Jaroslav Kortus
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0846 0 normal SHIPPED_LIVE pacemaker bug fix and enhancement update 2013-06-06 14:38:17 UTC

Description Jaroslav Kortus 2012-03-01 17:28:45 UTC
Description of problem:
crm_resource -W -r resource reports last known node even if the partition is without quorum. The other tools report it differently.


Version-Release number of selected component (if applicable):
pacemaker-1.1.7-1.el6.x86_64


How reproducible:
always

Steps to Reproduce:
1. setup resource in pacemaker
2. fail enough nodes to loose quorum
3. run crm_resource -W -r resource
  
Actual results:
last node on which it was running is reported

Expected results:
service reported as failed, stopped or not running anywhere

Additional info:
[root@node01:/]$ crm_mon -1
============
Last updated: Thu Mar  1 11:25:38 2012
Last change: Thu Mar  1 11:15:32 2012 via crm_shadow on node01
Stack: cman
Current DC: node01 - partition WITHOUT quorum
Version: 1.1.7-1.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, unknown expected votes
3 Resources configured.
============

Node node03: UNCLEAN (offline)
Online: [ node01 ]
OFFLINE: [ node02 ]

 virt-fencing   (stonith:fence_xvm):    Started node01

[root@node01:/]$ crm_resource -W -r webserver
resource webserver is running on: node03 

[root@node01:/]$ crm configure show
node node01
node node02
node node03
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="192.168.100.11" cidr_netmask="32" \
        op monitor interval="30s"
primitive virt-fencing stonith:fence_xvm \
        params pcmk_host_check="static-list" pcmk_host_list="node01,node02,node03" action="reboot" debug="1"
primitive webserver ocf:heartbeat:apache \
        params configfile="/etc/httpd/conf/httpd.conf" \
        op monitor interval="30s"
group group01 webserver ClusterIP
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-1.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
        cluster-infrastructure="cman"

Comment 2 Andrew Beekhof 2012-03-02 08:46:08 UTC
Can you include the output of cibadmin -Q when the cluster is in this state?

Comment 3 Jaroslav Kortus 2012-03-02 13:28:35 UTC
Created attachment 567063 [details]
cibadmin -Q output

cibadmin -Q attached.

Comment 4 Andrew Beekhof 2012-03-05 03:30:37 UTC
Strangely crm_simulate shows the same as crm_resource:

# CIB_file=~/Downloads/cibadmin.xml tools/crm_simulate -L

Current cluster status:
Node node03: UNCLEAN (offline)
Online: [ node01 ]
OFFLINE: [ node02 ]

 virt-fencing	(stonith:fence_xvm):	Started node01
 Resource Group: group01
     webserver	(ocf::heartbeat:apache):	Started node03
     ClusterIP	(ocf::heartbeat:IPaddr2):	Started node03

Comment 5 Andrew Beekhof 2012-03-05 03:33:42 UTC
Hmm, not so strange, it appears to be telling the truth and it is crm_mon that lies:

The last op for webserver on node03 was a successful start action:

            <lrm_rsc_op id="webserver_last_0" operation_key="webserver_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.6" transition-key="13:441:0:22406839-5bf5-4990-9755-00c39457b51e" transition-magic="0:0;13:441:0:22406839-5bf5-4990-9755-00c39457b51e" call-id="5" rc-code="0" op-status="0" interval="0" last-run="1330694649" last-rc-change="1330694649" exec-time="90" queue-time="0" op-digest="88eb8382443cc988d0e6ddee48ebac1a"/>

Comment 6 Andrew Beekhof 2012-03-13 12:04:20 UTC
A related patch has been committed upstream:
  https://github.com/beekhof/pacemaker/commit/31f6ca3

with subject:

   Medium: PE: Bug rhbz#799070 - Report resources as active in crm_mon if they are located on an unclean node

Further details (if any):

Comment 11 Jaroslav Kortus 2012-04-05 15:24:48 UTC
Now it behaves consistently
]$ crm_mon -1
============
Last updated: Thu Apr  5 10:22:13 2012
Last change: Thu Apr  5 10:06:46 2012 via crmd on m3c1-node01
Stack: cman
Current DC: m3c1-node01 - partition WITHOUT quorum
Version: 1.1.7-5.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, unknown expected votes
2 Resources configured.
============

Node m3c1-node03: UNCLEAN (offline)
Node m3c1-node02: UNCLEAN (offline)
Online: [ m3c1-node01 ]

 virt-fencing   (stonith:fence_xvm):    Started m3c1-node01
 webserver      (ocf::heartbeat:apache):        Started m3c1-node02


However, it is still a difference compared to cman. When rgmanager looses quorum no services are reported as running (they are not reported at all).

I would like to see something similar here as well. Pacemaker should report the service as running only when it has quorum and the service is working properly. In all other states it should not state Started, but some different state (failed, pending, unclean...).

Would this be possible?

pacemaker-1.1.7-5.el6.x86_64

Comment 12 Andrew Beekhof 2012-04-10 09:46:09 UTC
(In reply to comment #11)

> However, it is still a difference compared to cman. When rgmanager looses
> quorum no services are reported as running (they are not reported at all).

Pacemaker is not rgmanager.
Pacemaker reports resources running if they are running, not based on whether quorum is true or false.  Resources may or may not be allowed to run when quorum is lost, thats up to the admin.

Comment 16 Andrew Beekhof 2012-05-08 11:42:34 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: The logic for determining whether a resource was active was faulty
Consequence: Resources that were active but on an unclean node were ignored by tools that relied on this logic
Fix: Fix the check so that all tools agree.

Comment 18 errata-xmlrpc 2012-06-20 13:48:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0846.html


Note You need to log in before you can comment on or make changes to this bug.