Bug 1126998
| Summary: | pacemaker uses 'uname -n' if no nodename defined, which is confusing | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | John Ruemker <jruemker> |
| Component: | pcs | Assignee: | Chris Feist <cfeist> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.0 | CC: | cfeist, cluster-maint, dvossel, jkortus, rsteiger, tojeline, wagh1.ravi |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | pcs-0.9.134-1.el7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-03-05 09:20:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Unless I misunderstand something, this is expected and by design. IP addresses are not valid node names and are not at all interchangeable with shortname or shortname.domain.name Even shortname, othershortname and shortname.domain.name are not interchangeable just because they map to the same IP in DNS. I would agree that the names under 'PCSD Status' should match the 'Online' section though. Do you have an opinion there Chris? I talked with Andrew and I think the solution to this is in the PCSD Status section I'll include the output from (uname -n) on all the nodes as well as the node name in the corosync section (with the ip address in parenthesis after). I'll also include a warning message if the corosync & pacemaker nodes are not identical. why are IPs not valid node names? I'm asking because we've run into similar issues when using ad-hoc rings that are not in DNS. The guess pacemaker makes to fill in node name has caused more trouble for me then it brought benefits ;). Patch upstream here: https://github.com/feist/pcs/commit/b7e0144fe84e953fab198bef376a952fbfcdcad5 Before Fix: [root@rh70-node1:~]# rpm -q pcs pcs-0.9.115-32.el7.x86_64 [root@rh70-node1:~]# pcs cluster auth 192.168.122.101 192.168.122.102 Username: hacluster Password: 192.168.122.101: Authorized 192.168.122.102: Authorized [root@rh70-node1:~]# pcs cluster setup 192.168.122.101 192.168.122.102 --name mycluster Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... 192.168.122.101: Succeeded 192.168.122.102: Succeeded [root@rh70-node1:~]# pcs cluster start --all 192.168.122.101: Starting Cluster... 192.168.122.102: Starting Cluster... [root@rh70-node1:~]# pcs status Cluster name: mycluster WARNING: no stonith devices and stonith-enabled is not false Last updated: Tue Oct 21 14:25:11 2014 Last change: Tue Oct 21 14:21:36 2014 Stack: corosync Current DC: rh70-node1 (1) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 0 Resources configured Online: [ rh70-node1 rh70-node2 ] Full list of resources: PCSD Status: 192.168.122.101: Online 192.168.122.102: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled After Fix: [root@rh70-node1:~]# rpm -q pcs pcs-0.9.134-1.el7.x86_64 [root@rh70-node1:~]# pcs cluster auth 192.168.122.101 192.168.122.102 Username: hacluster Password: 192.168.122.101: Authorized 192.168.122.102: Authorized [root@rh70-node1:~]# pcs cluster setup 192.168.122.101 192.168.122.102 --name mycluster Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... 192.168.122.101: Succeeded 192.168.122.102: Succeeded [root@rh70-node1:~]# pcs cluster start --all 192.168.122.101: Starting Cluster... 192.168.122.102: Starting Cluster... [root@rh70-node1:~]# pcs status Cluster name: mycluster WARNING: no stonith devices and stonith-enabled is not false WARNING: corosync and pacemaker node names do not match (IPs used in setup?) Last updated: Tue Oct 21 16:09:36 2014 Last change: Tue Oct 21 15:52:26 2014 Stack: corosync Current DC: rh70-node1 (1) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 0 Resources configured Online: [ rh70-node1 rh70-node2 ] Full list of resources: PCSD Status: rh70-node1 (192.168.122.101): Online rh70-node2 (192.168.122.102): Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled note the "WARNING: corosync and pacemaker node names do not match (IPs used in setup?)" line Created pacemaker follow up as https://bugzilla.redhat.com/show_bug.cgi?id=1183103 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0415.html I just changed ring0_addr from ip to hostname. Solved issue |
Description of problem: If you configure a cluster using IP addresses, such as if you did: # pcs cluster setup --name rhel7-cluster 192.168.143.71 192.168.143.72 or if you defined the cluster via IPs in corosync.conf: nodelist { node { ring0_addr: 192.168.143.71 nodeid: 1 } node { ring0_addr: 192.168.143.72 nodeid: 2 } } then pacemaker will automatically guess the nodename to be 'uname -n' and put that in the CIB when its created: # pcs status Cluster name: rhel7-cluster WARNING: no stonith devices and stonith-enabled is not false Last updated: Tue Aug 5 15:37:41 2014 Last change: Tue Aug 5 15:37:13 2014 via crmd on jrummy7-2.usersys.redhat.com Stack: corosync Current DC: jrummy7-2.usersys.redhat.com (2) - partition with quorum Version: 1.1.10-23.el7-368c726 2 Nodes configured 0 Resources configured Online: [ jrummy7-1.usersys.redhat.com jrummy7-2.usersys.redhat.com ] Full list of resources: PCSD Status: 192.168.143.71: Online 192.168.143.72: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled This can be confusing for users in a few ways: - If you then try to specify a node by that same IP address in pcs commands, it will not recognize the node: [root@jrummy7-1 ~]# pcs resource create vip IPaddr2 ip=192.168.143.99 cidr_netmask=24 [root@jrummy7-1 ~]# pcs resource move vip 192.168.143.72 Error: error moving/banning/clearing resource Error performing operation: node '192.168.143.72' is unknown Error performing operation: No such device or address - If you try to use a hostname that's mapped to that same IP in /etc/hosts in pcs commands, it will fail: [root@jrummy7-1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.143.71 rhel7-node1.example.com rhel7-node1 jrummy7-1-clust 192.168.243.71 rhel7-node1-alt jrummy7-1-alt 192.168.143.72 rhel7-node2.example.com rhel7-node2 jrummy7-2-clust 192.168.243.72 rhel7-node2-alt jrummy7-2-alt [root@jrummy7-1 ~]# pcs resource move vip rhel7-node2.example.com Error: error moving/banning/clearing resource Error performing operation: node 'rhel7-node2.example.com' is unknown Error performing operation: No such device or address - The node name displayed by pcs and in the cib may imply to some users that the network which maps to that hostname is being used for communications. Despite the fact that all three of these points directly relate to pcs' handling of the situation, it seems like changing pacemaker to not guess the node name to be uname -n and to not put that in the cib would be the more complete solution than changing pcs, so I've filed this against pacemaker for now. It seems we should just let pacemaker put the IPs node uname in the CIB on startup, or at least look for mapped names for the IPs coming from corosync and put those in the CIB, rather than use a name which has as much chance of being wrong as it does right. Version-Release number of selected component (if applicable): pacemaker-1.1.10-23.el7.x86_64 How reproducible: Any time corosync is configured to use IP addresses as node ring addrs Steps to Reproduce: 1. Setup cluster using IPs # pcs cluster setup --name rhel7-cluster 192.168.143.71 192.168.143.72 --start 2. Use 'pcs status' to check cluster membership and try to deduce what interface is being used 3. Run a pcs command that requires a node name, like 'pcs resource move', using the IPs used in #1, or a name that maps to them. Actual results: 2. 'pcs status' shows hostname that doesn't correspond to configured IPs 3. pcs commands don't accept the configured IPs, or names mapped to them Expected results: pcs status shows either the IP that was configured in step #1, or a name that maps to it. pcs commands accept at least the IP in place of a node name, or possibly also hostnames that mapt ot h Additional info: