Bug 1659269 - Detect need for ip_version in corosync.conf
Summary: Detect need for ip_version in corosync.conf
Keywords:
Status: CLOSED DUPLICATE of bug 1659389
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pcs
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 8.0
Assignee: Tomas Jelinek
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-13 23:25 UTC by Ken Gaillot
Modified: 2019-01-07 12:17 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-07 12:15:56 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description Ken Gaillot 2018-12-13 23:25:13 UTC
Description of problem: Due to name resolution changes in RHEL 8, it is easily possible for a user to be unable to start corosync, and not easily see why.

Version-Release number of selected component (if applicable): 0.10.1-2.el8


How reproducible: reliably


Steps to Reproduce:
1. Configure two hosts to have a short hostname (e.g. node1 instead of node1.example.com) and an IPv4 network for corosync.
2. Ensure each host has any IPv6 address configured (e.g. a link local address). The IPv6 address does not need to be in /etc/hosts or DNS.
3. Add the short hostnames to /etc/hosts with the IPv4 address only.
4. Run "pcs cluster setup" such that ring0_addr in corosync.conf uses the short hostnames.
5. Run "pcs cluster start".


Actual results:
pcs cluster setup succeeds, but pcs cluster start fails because corosync fails to start, with the log message "parse error in config: Nodes for link 0 have different IP families".


Expected results: pcs cluster setup warns the user about an invalid configuration, or automatically configures corosync.conf such that corosync uses the IPv4 address specified in /etc/hosts.


Additional info:
Any of these workarounds will avoid the issue:
1. Specify "ip_version:" as "ipv4" or "ipv4-6" in corosync.conf's totem{} section.
2. Specify "ring0_addr:" as the IPv4 address.
3. Specify the full hostname as the local hostname (ring0_addr can still use the short hostname).
4. Remove "myhostname" from nsswitch.conf.

This will affect anyone who uses a short local hostname and otherwise uses a default setup. The cause is difficult to determine, so detecting and warning about the condition (or even better, setting ip_version appropriately) would avoid support calls.

The problem arises because (without ip_version) corosync will query name resolution for an IPv6 address first, and only if that returns nothing, query for an IPv4 address, combined with nsswitch.conf now containing systemd's "myhostname" plugin, which will return the IPv6 address even if it is not configured in /etc/hosts or DNS (or anywhere). So, corosync finds only IPv4 addresses for all other nodes, but finds the IPv6 address for the local node.

Comment 3 Ken Gaillot 2018-12-14 15:07:32 UTC
(In reply to Radek Steiger from comment #2)
> Ok so the missing information here is a specific corosync version that's
> needed for a reproducer. Specifically corosync-2.99.5-2.el8.x86_64.

Ah, I didn't realize that. I should also have realized that a particular version of systemd is also necessary, i.e. one that provides the myhostname nss plugin, as well as having myhostname configured for hosts in /etc/nsswitch.conf (which it does by default).

> Also, custom node names need to be used because our beaker lab does have
> records for both IPv4 and IPv6 for the same hostname and therefore corosync
> binds on the IPv6 address automatically.

Good point, the basic scenario is only IPv4 name/address mappings are configured anywhere for the short node names.

> Two issues here:
>  - sadly, it binds on IPv6 also if ip_version=ipv4 is specified so this
> workaround doesn't seem to be working

I should have mentioned that patch hasn't been released yet. :) That will be the preferred solution, and Honza has a working patch, but it still has to make its way through upstream etc.

>  - pcs doesn't support arbitrary values for corosync options so the 'ipv4-6'
> needs to be added to pcs code to make it recognized.

Good to know, will clone this once the patch is finalized

Comment 4 Ken Gaillot 2018-12-14 15:09:17 UTC
> > Also, custom node names need to be used because our beaker lab does have
> > records for both IPv4 and IPv6 for the same hostname and therefore corosync
> > binds on the IPv6 address automatically.
> 
> Good point, the basic scenario is only IPv4 name/address mappings are
> configured anywhere for the short node names.

And of course the local hostname (uname -n) must be used as the local node name
 
> >  - pcs doesn't support arbitrary values for corosync options so the 'ipv4-6'
> > needs to be added to pcs code to make it recognized.
> 
> Good to know, will clone this once the patch is finalized

Still having my morning coffee, of course that can be done with this bz

Comment 5 Ken Gaillot 2018-12-14 15:16:02 UTC
Per Honza, the relevant patch is already in upstream 3.0.0 version, and a build with it has been added to the errata, so it should be in nightlies soon.

However he believes workaround #2 (specifying IP in ring0_addr) is the superior approach, which would make this bz a duplicate of Bug 1659389.

Comment 6 Tomas Jelinek 2019-01-07 12:15:56 UTC
Closing per comment 5.

*** This bug has been marked as a duplicate of bug 1659389 ***


Note You need to log in before you can comment on or make changes to this bug.