Bug 831737
Summary: | Node list should be fetched from corosync at first start | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jaroslav Kortus <jkortus> |
Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | pacemaker-1.1.8-3.el7 | Doc Type: | Enhancement |
Doc Text: |
Feature:
Query the list of configured nodes from corosync at startup.
Reason:
When the cluster starts, Pacemaker may not know about all possible peers. As a result they those peers may not be fenced appropriately - even though they are known to corosync.
Result (if any):
All nodes listed in corosync.conf are recorded in the Pacemaker configuration and any that have not been seen by the time the cluster obtains quorum will be fenced.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2014-06-16 06:35:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jaroslav Kortus
2012-06-13 16:39:18 UTC
I'm confused, we already obtain the node list from corosync's membership API. A bit more info what I mean: $ corosync-quorumtool -s Quorum information ------------------ Date: Thu Jun 14 04:01:45 2012 Quorum provider: corosync_votequorum Nodes: 15 Ring ID: 952 Quorate: Yes Votequorum information ---------------------- Node ID: 1 Node state: Member Node votes: 1 Expected votes: 15 Highest expected: 15 Total votes: 15 Quorum: 8 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 1 1 r7-node01 2 1 r7-node02 3 1 r7-node03 4 1 r7-node04 5 1 r7-node05 6 1 r7-node06 7 1 r7-node07 8 1 r7-node08 9 1 r7-node09 10 1 r7-node10 11 1 r7-node11 12 1 r7-node12 13 1 r7-node13 14 1 r7-node14 15 1 r7-node15 $ service pacemaker start $ crm_mon -1 ============ Last updated: Thu Jun 14 04:03:49 2012 Last change: Thu Jun 14 04:02:18 2012 via crmd on r7-node01 Stack: corosync Current DC: r7-node01 (1) - partition with quorum Version: 1.1.7-2.el7-ee0730e13d124c3d58f00016c3376a1de5323cff 1 Nodes configured, unknown expected votes 0 Resources configured. ============ Online: [ r7-node01 ] $ cibadmin -Q <cib epoch="3" num_updates="8" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="r7-node01" update-client="crmd" cib-last-written="Thu Jun 14 04:02:18 2012" have-quorum="1" dc-uuid="1"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.7-2.el7-ee0730e13d124c3d58f00016c3376a1de5323cff"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/> </cluster_property_set> </crm_config> <nodes> <node id="1" uname="r7-node01" type="normal"/> </nodes> <resources/> <constraints/> </configuration> <status> <node_state id="1" uname="r7-node01" ha="active" in_ccm="true" crmd="online" join="member" expected="member" crm-debug-origin="do_state_transition" shutdown="0"> <lrm id="1"> <lrm_resources/> </lrm> <transient_attributes id="1"> <instance_attributes id="status-1"> <nvpair id="status-1-probe_complete" name="probe_complete" value="true"/> </instance_attributes> </transient_attributes> </node_state> </status> </cib> So there were two problems here... 1 - the node list we obtained from corosync was limited to active members. As suggested, I have now taught pacemaker to obtain the contents of 'nodelist {}' if present. 2 - corosync doesn't normally contain node names This meant that Pacemaker didn't have enough information to create an entry under '<nodes/>'. Nor would fencing have enough information to complete (since it too is name based). Things have been changed to improve our behaviour here: a - obtain the value of nodelist.node.$N.ring0_addr from the CMAP API as it may contain a node name, otherwise b - copy the quorumtool's logic (use the CFG API to look up the interface address and look that up in DNS), otherwise c - support a name being set at nodelist.node.$N.name if for some reason 'a' and 'b' are not possible/working I'll include these in a build later this week. Fixes included in pacemaker-1.1.8-2.el7 |