Bug 831737
| Summary: | Node list should be fetched from corosync at first start | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jaroslav Kortus <jkortus> |
| Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.0 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-1.1.8-3.el7 | Doc Type: | Enhancement |
| Doc Text: |
Feature:
Query the list of configured nodes from corosync at startup.
Reason:
When the cluster starts, Pacemaker may not know about all possible peers. As a result they those peers may not be fenced appropriately - even though they are known to corosync.
Result (if any):
All nodes listed in corosync.conf are recorded in the Pacemaker configuration and any that have not been seen by the time the cluster obtains quorum will be fenced.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-06-16 06:35:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jaroslav Kortus
2012-06-13 16:39:18 UTC
I'm confused, we already obtain the node list from corosync's membership API. A bit more info what I mean:
$ corosync-quorumtool -s
Quorum information
------------------
Date: Thu Jun 14 04:01:45 2012
Quorum provider: corosync_votequorum
Nodes: 15
Ring ID: 952
Quorate: Yes
Votequorum information
----------------------
Node ID: 1
Node state: Member
Node votes: 1
Expected votes: 15
Highest expected: 15
Total votes: 15
Quorum: 8
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
1 1 r7-node01
2 1 r7-node02
3 1 r7-node03
4 1 r7-node04
5 1 r7-node05
6 1 r7-node06
7 1 r7-node07
8 1 r7-node08
9 1 r7-node09
10 1 r7-node10
11 1 r7-node11
12 1 r7-node12
13 1 r7-node13
14 1 r7-node14
15 1 r7-node15
$ service pacemaker start
$ crm_mon -1
============
Last updated: Thu Jun 14 04:03:49 2012
Last change: Thu Jun 14 04:02:18 2012 via crmd on r7-node01
Stack: corosync
Current DC: r7-node01 (1) - partition with quorum
Version: 1.1.7-2.el7-ee0730e13d124c3d58f00016c3376a1de5323cff
1 Nodes configured, unknown expected votes
0 Resources configured.
============
Online: [ r7-node01 ]
$ cibadmin -Q
<cib epoch="3" num_updates="8" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="r7-node01" update-client="crmd" cib-last-written="Thu Jun 14 04:02:18 2012" have-quorum="1" dc-uuid="1">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.7-2.el7-ee0730e13d124c3d58f00016c3376a1de5323cff"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="1" uname="r7-node01" type="normal"/>
</nodes>
<resources/>
<constraints/>
</configuration>
<status>
<node_state id="1" uname="r7-node01" ha="active" in_ccm="true" crmd="online" join="member" expected="member" crm-debug-origin="do_state_transition" shutdown="0">
<lrm id="1">
<lrm_resources/>
</lrm>
<transient_attributes id="1">
<instance_attributes id="status-1">
<nvpair id="status-1-probe_complete" name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
</node_state>
</status>
</cib>
So there were two problems here...
1 - the node list we obtained from corosync was limited to active members.
As suggested, I have now taught pacemaker to obtain the contents of 'nodelist {}' if present.
2 - corosync doesn't normally contain node names
This meant that Pacemaker didn't have enough information to create an entry under '<nodes/>'. Nor would fencing have enough information to complete (since it too is name based).
Things have been changed to improve our behaviour here:
a - obtain the value of nodelist.node.$N.ring0_addr from the CMAP API as it may contain a node name, otherwise
b - copy the quorumtool's logic (use the CFG API to look up the interface address and look that up in DNS), otherwise
c - support a name being set at nodelist.node.$N.name if for some reason 'a' and 'b' are not possible/working
I'll include these in a build later this week.
Fixes included in pacemaker-1.1.8-2.el7 |