Bug 831737

Summary: Node list should be fetched from corosync at first start
Product: Red Hat Enterprise Linux 7 Reporter: Jaroslav Kortus <jkortus>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.8-3.el7 Doc Type: Enhancement
Doc Text:
Feature: Query the list of configured nodes from corosync at startup. Reason: When the cluster starts, Pacemaker may not know about all possible peers. As a result they those peers may not be fenced appropriately - even though they are known to corosync. Result (if any): All nodes listed in corosync.conf are recorded in the Pacemaker configuration and any that have not been seen by the time the cluster obtains quorum will be fenced.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-16 06:35:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaroslav Kortus 2012-06-13 16:39:18 UTC
Description of problem:
Pacemaker registers nodes based on whether it has seen them before or not. This should be avoided by fetching them from corosync nodelist (if present).

Situation is like this:
Corosync is started on all 15 nodes. Pacemaker has never been started there. Let's say I'll start it on nodes 1-3. On these nodes the cib database gets populated with nodes 1-3. Now I turn pacemaker off and on again and what happens. Pacemaker is there with 3 nodes and does not miss any other node, no matter that corosync says it should be 15.

If you repeat the same situation with no cib present and start pacemaker on all nodes instead of just 3 in the first step, turn it off on all nodes and on on 1-3, the rest of the nodes will get fenced (will be recognized as missing, or let's better say UNCLEAN).

What pacemaker should IMHO do is fetching the nodelist on each start from corosync and determine the missing ones based on that list.

Version-Release number of selected component (if applicable):
pacemaker-1.1.7-2.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. see above
2.
3.
  
Actual results:
corosync nodelist and pacemaker node list differ based on where pacemaker has been already started before

Expected results:
corosync nodelist and pacemaker node list are in sync all the time and from the very first start

Additional info:

Comment 1 Andrew Beekhof 2012-06-13 23:26:00 UTC
I'm confused, we already obtain the node list from corosync's membership API.

Comment 2 Jaroslav Kortus 2012-06-14 09:04:59 UTC
A bit more info what I mean:

$ corosync-quorumtool -s
Quorum information
------------------
Date:             Thu Jun 14 04:01:45 2012
Quorum provider:  corosync_votequorum
Nodes:            15

Ring ID:          952
Quorate:          Yes

Votequorum information
----------------------
Node ID:          1
Node state:       Member
Node votes:       1
Expected votes:   15
Highest expected: 15
Total votes:      15
Quorum:           8  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
         1          1 r7-node01
         2          1 r7-node02
         3          1 r7-node03
         4          1 r7-node04
         5          1 r7-node05
         6          1 r7-node06
         7          1 r7-node07
         8          1 r7-node08
         9          1 r7-node09
        10          1 r7-node10
        11          1 r7-node11
        12          1 r7-node12
        13          1 r7-node13
        14          1 r7-node14
        15          1 r7-node15


$ service pacemaker start
$ crm_mon -1
============
Last updated: Thu Jun 14 04:03:49 2012
Last change: Thu Jun 14 04:02:18 2012 via crmd on r7-node01
Stack: corosync
Current DC: r7-node01 (1) - partition with quorum
Version: 1.1.7-2.el7-ee0730e13d124c3d58f00016c3376a1de5323cff
1 Nodes configured, unknown expected votes
0 Resources configured.
============

Online: [ r7-node01 ]

$ cibadmin -Q
<cib epoch="3" num_updates="8" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="r7-node01" update-client="crmd" cib-last-written="Thu Jun 14 04:02:18 2012" have-quorum="1" dc-uuid="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.7-2.el7-ee0730e13d124c3d58f00016c3376a1de5323cff"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="r7-node01" type="normal"/>
    </nodes>
    <resources/>
    <constraints/>
  </configuration>
  <status>
    <node_state id="1" uname="r7-node01" ha="active" in_ccm="true" crmd="online" join="member" expected="member" crm-debug-origin="do_state_transition" shutdown="0">
      <lrm id="1">
        <lrm_resources/>
      </lrm>
      <transient_attributes id="1">
        <instance_attributes id="status-1">
          <nvpair id="status-1-probe_complete" name="probe_complete" value="true"/>
        </instance_attributes>
      </transient_attributes>
    </node_state>
  </status>
</cib>

Comment 3 Andrew Beekhof 2012-07-17 03:56:32 UTC
So there were two problems here...

1 - the node list we obtained from corosync was limited to active members.

As suggested, I have now taught pacemaker to obtain the contents of 'nodelist {}' if present.

2 - corosync doesn't normally contain node names

This meant that Pacemaker didn't have enough information to create an entry under '<nodes/>'.  Nor would fencing have enough information to complete (since it too is name based).

Things have been changed to improve our behaviour here:

a - obtain the value of nodelist.node.$N.ring0_addr from the CMAP API as it may contain a node name, otherwise

b - copy the quorumtool's logic (use the CFG API to look up the interface address and look that up in DNS), otherwise

c - support a name being set at nodelist.node.$N.name if for some reason 'a' and 'b' are not possible/working

I'll include these in a build later this week.

Comment 4 Andrew Beekhof 2012-08-16 03:50:24 UTC
Fixes included in pacemaker-1.1.8-2.el7