831737 – Node list should be fetched from corosync at first start

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 831737 - Node list should be fetched from corosync at first start

Summary: Node list should be fetched from corosync at first start

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Andrew Beekhof
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-06-13 16:39 UTC by Jaroslav Kortus
Modified:	2014-06-16 06:35 UTC (History)
CC List:	0 users
Fixed In Version:	pacemaker-1.1.8-3.el7
Doc Type:	Enhancement
Doc Text:	Feature: Query the list of configured nodes from corosync at startup. Reason: When the cluster starts, Pacemaker may not know about all possible peers. As a result they those peers may not be fenced appropriately - even though they are known to corosync. Result (if any): All nodes listed in corosync.conf are recorded in the Pacemaker configuration and any that have not been seen by the time the cluster obtains quorum will be fenced.
Clone Of:
Environment:
Last Closed:	2014-06-16 06:35:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Jaroslav Kortus 2012-06-13 16:39:18 UTC

Description of problem:
Pacemaker registers nodes based on whether it has seen them before or not. This should be avoided by fetching them from corosync nodelist (if present).

Situation is like this:
Corosync is started on all 15 nodes. Pacemaker has never been started there. Let's say I'll start it on nodes 1-3. On these nodes the cib database gets populated with nodes 1-3. Now I turn pacemaker off and on again and what happens. Pacemaker is there with 3 nodes and does not miss any other node, no matter that corosync says it should be 15.

If you repeat the same situation with no cib present and start pacemaker on all nodes instead of just 3 in the first step, turn it off on all nodes and on on 1-3, the rest of the nodes will get fenced (will be recognized as missing, or let's better say UNCLEAN).

What pacemaker should IMHO do is fetching the nodelist on each start from corosync and determine the missing ones based on that list.

Version-Release number of selected component (if applicable):
pacemaker-1.1.7-2.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. see above
2.
3.
  
Actual results:
corosync nodelist and pacemaker node list differ based on where pacemaker has been already started before

Expected results:
corosync nodelist and pacemaker node list are in sync all the time and from the very first start

Additional info:

Comment 1 Andrew Beekhof 2012-06-13 23:26:00 UTC

I'm confused, we already obtain the node list from corosync's membership API.

Comment 2 Jaroslav Kortus 2012-06-14 09:04:59 UTC

A bit more info what I mean:

$ corosync-quorumtool -s
Quorum information
------------------
Date:             Thu Jun 14 04:01:45 2012
Quorum provider:  corosync_votequorum
Nodes:            15

Ring ID:          952
Quorate:          Yes

Votequorum information
----------------------
Node ID:          1
Node state:       Member
Node votes:       1
Expected votes:   15
Highest expected: 15
Total votes:      15
Quorum:           8  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
         1          1 r7-node01
         2          1 r7-node02
         3          1 r7-node03
         4          1 r7-node04
         5          1 r7-node05
         6          1 r7-node06
         7          1 r7-node07
         8          1 r7-node08
         9          1 r7-node09
        10          1 r7-node10
        11          1 r7-node11
        12          1 r7-node12
        13          1 r7-node13
        14          1 r7-node14
        15          1 r7-node15


$ service pacemaker start
$ crm_mon -1
============
Last updated: Thu Jun 14 04:03:49 2012
Last change: Thu Jun 14 04:02:18 2012 via crmd on r7-node01
Stack: corosync
Current DC: r7-node01 (1) - partition with quorum
Version: 1.1.7-2.el7-ee0730e13d124c3d58f00016c3376a1de5323cff
1 Nodes configured, unknown expected votes
0 Resources configured.
============

Online: [ r7-node01 ]

$ cibadmin -Q
<cib epoch="3" num_updates="8" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="r7-node01" update-client="crmd" cib-last-written="Thu Jun 14 04:02:18 2012" have-quorum="1" dc-uuid="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.7-2.el7-ee0730e13d124c3d58f00016c3376a1de5323cff"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="r7-node01" type="normal"/>
    </nodes>
    <resources/>
    <constraints/>
  </configuration>
  <status>
    <node_state id="1" uname="r7-node01" ha="active" in_ccm="true" crmd="online" join="member" expected="member" crm-debug-origin="do_state_transition" shutdown="0">
      <lrm id="1">
        <lrm_resources/>
      </lrm>
      <transient_attributes id="1">
        <instance_attributes id="status-1">
          <nvpair id="status-1-probe_complete" name="probe_complete" value="true"/>
        </instance_attributes>
      </transient_attributes>
    </node_state>
  </status>
</cib>

Comment 3 Andrew Beekhof 2012-07-17 03:56:32 UTC

So there were two problems here...

1 - the node list we obtained from corosync was limited to active members.

As suggested, I have now taught pacemaker to obtain the contents of 'nodelist {}' if present.

2 - corosync doesn't normally contain node names

This meant that Pacemaker didn't have enough information to create an entry under '<nodes/>'.  Nor would fencing have enough information to complete (since it too is name based).

Things have been changed to improve our behaviour here:

a - obtain the value of nodelist.node.$N.ring0_addr from the CMAP API as it may contain a node name, otherwise

b - copy the quorumtool's logic (use the CFG API to look up the interface address and look that up in DNS), otherwise

c - support a name being set at nodelist.node.$N.name if for some reason 'a' and 'b' are not possible/working

I'll include these in a build later this week.

Comment 4 Andrew Beekhof 2012-08-16 03:50:24 UTC

Fixes included in pacemaker-1.1.8-2.el7

Note You need to log in before you can comment on or make changes to this bug.