Bug 1126998

Summary:	pacemaker uses 'uname -n' if no nodename defined, which is confusing
Product:	Red Hat Enterprise Linux 7	Reporter:	John Ruemker <jruemker>
Component:	pcs	Assignee:	Chris Feist <cfeist>
Status:	CLOSED ERRATA	QA Contact:	cluster-qe <cluster-qe>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7.0	CC:	cfeist, cluster-maint, dvossel, jkortus, rsteiger, tojeline, wagh1.ravi
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	pcs-0.9.134-1.el7	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-03-05 09:20:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description John Ruemker 2014-08-05 20:11:07 UTC

Description of problem: If you configure a cluster using IP addresses, such as if you did:

  # pcs cluster setup --name rhel7-cluster 192.168.143.71 192.168.143.72

or if you defined the cluster via IPs in corosync.conf:

  nodelist {
    node {
          ring0_addr: 192.168.143.71
          nodeid: 1
         }
    node {
          ring0_addr: 192.168.143.72
          nodeid: 2
         } 
  }

then pacemaker will automatically guess the nodename to be 'uname -n' and put that in the CIB when its created:
  
  # pcs status
  Cluster name: rhel7-cluster
  WARNING: no stonith devices and stonith-enabled is not false
  Last updated: Tue Aug  5 15:37:41 2014
  Last change: Tue Aug  5 15:37:13 2014 via crmd on jrummy7-2.usersys.redhat.com
  Stack: corosync
  Current DC: jrummy7-2.usersys.redhat.com (2) - partition with quorum
  Version: 1.1.10-23.el7-368c726
  2 Nodes configured
  0 Resources configured


  Online: [ jrummy7-1.usersys.redhat.com jrummy7-2.usersys.redhat.com ]
 
  Full list of resources:


  PCSD Status:
    192.168.143.71: Online
    192.168.143.72: Online

  Daemon Status:
    corosync: active/disabled
    pacemaker: active/disabled
    pcsd: active/enabled


This can be confusing for users in a few ways:

- If you then try to specify a node by that same IP address in pcs commands, it will not recognize the node:

  [root@jrummy7-1 ~]# pcs resource create vip IPaddr2 ip=192.168.143.99 cidr_netmask=24 
  [root@jrummy7-1 ~]# pcs resource move vip 192.168.143.72
  Error: error moving/banning/clearing resource
  Error performing operation: node '192.168.143.72' is unknown
  Error performing operation: No such device or address

- If you try to use a hostname that's mapped to that same IP in /etc/hosts in pcs commands, it will fail:

  [root@jrummy7-1 ~]# cat /etc/hosts
  127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
  ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

  192.168.143.71 rhel7-node1.example.com rhel7-node1 jrummy7-1-clust
  192.168.243.71 rhel7-node1-alt jrummy7-1-alt

  192.168.143.72 rhel7-node2.example.com rhel7-node2 jrummy7-2-clust
  192.168.243.72 rhel7-node2-alt jrummy7-2-alt

  [root@jrummy7-1 ~]# pcs resource move vip rhel7-node2.example.com 
  Error: error moving/banning/clearing resource
  Error performing operation: node 'rhel7-node2.example.com' is unknown
  Error performing operation: No such device or address

- The node name displayed by pcs and in the cib may imply to some users that the network which maps to that hostname is being used for communications.

Despite the fact that all three of these points directly relate to pcs' handling of the situation, it seems like changing pacemaker to not guess the node name to be uname -n and to not put that in the cib would be the more complete solution than changing pcs, so I've filed this against pacemaker for now.  It seems we should just let pacemaker put the IPs node uname in the CIB on startup, or at least look for mapped names for the IPs coming from corosync and put those in the CIB, rather than use a name which has as much chance of being wrong as it does right.  

Version-Release number of selected component (if applicable): pacemaker-1.1.10-23.el7.x86_64

How reproducible: Any time corosync is configured to use IP addresses as node ring addrs

Steps to Reproduce:
1. Setup cluster using IPs

  # pcs cluster setup --name rhel7-cluster 192.168.143.71 192.168.143.72 --start
 
2. Use 'pcs status' to check cluster membership and try to deduce what interface is being used

3. Run a pcs command that requires a node name, like 'pcs resource move', using the IPs used in #1, or a name that maps to them.

Actual results: 
2. 'pcs status' shows hostname that doesn't correspond to configured IPs
3. pcs commands don't accept the configured IPs, or names mapped to them


Expected results: pcs status shows either the IP that was configured in step #1, or a name that maps to it.  pcs commands accept at least the IP in place of a node name, or possibly also hostnames that mapt ot h


Additional info:

Comment 2 Andrew Beekhof 2014-08-07 04:02:35 UTC

Unless I misunderstand something, this is expected and by design.

IP addresses are not valid node names and are not at all interchangeable with shortname or shortname.domain.name

Even shortname, othershortname and shortname.domain.name are not interchangeable just because they map to the same IP in DNS.

I would agree that the names under 'PCSD Status' should match the 'Online' section though.  Do you have an opinion there Chris?

Comment 3 Chris Feist 2014-08-07 23:00:32 UTC

I talked with Andrew and I think the solution to this is in the PCSD Status section I'll include the output from (uname -n) on all the nodes as well as the node name in the corosync section (with the ip address in parenthesis after).

I'll also include a warning message if the corosync & pacemaker nodes are not identical.

Comment 4 Jaroslav Kortus 2014-08-11 12:09:30 UTC

why are IPs not valid node names?
I'm asking because we've run into similar issues when using ad-hoc rings that are not in DNS. The guess pacemaker makes to fill in node name has caused more trouble for me then it brought benefits ;).

Comment 6 Chris Feist 2014-10-17 21:01:04 UTC

Patch upstream here:

https://github.com/feist/pcs/commit/b7e0144fe84e953fab198bef376a952fbfcdcad5

Comment 7 Tomas Jelinek 2014-10-21 14:11:52 UTC

Before Fix:
[root@rh70-node1:~]# rpm -q pcs
pcs-0.9.115-32.el7.x86_64
[root@rh70-node1:~]# pcs cluster auth 192.168.122.101 192.168.122.102
Username: hacluster
Password:
192.168.122.101: Authorized
192.168.122.102: Authorized
[root@rh70-node1:~]# pcs cluster setup 192.168.122.101 192.168.122.102 --name mycluster
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
192.168.122.101: Succeeded
192.168.122.102: Succeeded
[root@rh70-node1:~]# pcs cluster start --all
192.168.122.101: Starting Cluster...
192.168.122.102: Starting Cluster...
[root@rh70-node1:~]# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Tue Oct 21 14:25:11 2014
Last change: Tue Oct 21 14:21:36 2014
Stack: corosync
Current DC: rh70-node1 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
0 Resources configured


Online: [ rh70-node1 rh70-node2 ]

Full list of resources:


PCSD Status:
  192.168.122.101: Online
  192.168.122.102: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled



After Fix:
[root@rh70-node1:~]# rpm -q pcs
pcs-0.9.134-1.el7.x86_64
[root@rh70-node1:~]# pcs cluster auth 192.168.122.101 192.168.122.102
Username: hacluster
Password:
192.168.122.101: Authorized
192.168.122.102: Authorized
[root@rh70-node1:~]# pcs cluster setup 192.168.122.101 192.168.122.102 --name mycluster
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
192.168.122.101: Succeeded
192.168.122.102: Succeeded
[root@rh70-node1:~]# pcs cluster start --all
192.168.122.101: Starting Cluster...
192.168.122.102: Starting Cluster...
[root@rh70-node1:~]# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Tue Oct 21 16:09:36 2014
Last change: Tue Oct 21 15:52:26 2014
Stack: corosync
Current DC: rh70-node1 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
0 Resources configured


Online: [ rh70-node1 rh70-node2 ]

Full list of resources:


PCSD Status:
  rh70-node1 (192.168.122.101): Online
  rh70-node2 (192.168.122.102): Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


note the "WARNING: corosync and pacemaker node names do not match (IPs used in setup?)" line

Comment 10 Jaroslav Kortus 2015-01-16 17:05:29 UTC

Created pacemaker follow up as https://bugzilla.redhat.com/show_bug.cgi?id=1183103

Comment 12 errata-xmlrpc 2015-03-05 09:20:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0415.html

Comment 13 Ravikumar Wagh 2020-08-20 12:54:14 UTC

I just changed ring0_addr from ip to hostname.
Solved issue