Bug 1988751

Summary: Incosistent behavior in the order of IPv6 addresses at boot vs. when restarting NetworkManager
Product: Red Hat Enterprise Linux 8 Reporter: Marcel Härri <mharri>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED ERRATA QA Contact: David Jaša <djasa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.5CC: atragler, bgalvani, bstinson, djasa, fge, fpokryvk, jwboyer, lrintel, mharri, rkhan, sukulkar, thaller, till, vbenes, wenliang
Target Milestone: betaKeywords: Triaged
Target Release: 8.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 19:30:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marcel Härri 2021-07-31 18:52:07 UTC
Description of problem:

Let's assume the following network config:

# cat /etc/sysconfig/network-scripts/ifcfg-ens3
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=10.94.0.5
PREFIX=24
GATEWAY=10.94.0.1
DNS1=192.168.1.42
DEFROUTE=yes
DHCP_VENDOR_CLASS_IDENTIFIER=anaconda-Linux
IPV4_FAILURE_FATAL=yes
IPV6INIT=yes
IPV6_AUTOCONF=no
IPV6ADDR=2a02:168:f00d:caab::2/64
IPV6ADDR_SECONDARIES="2a02:168:f00d:caab::99/64"
IPV6_DEFAULTGW=fe80::1
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=yes
NAME=ens3
UUID=0b4f730f-6894-43bc-942d-a23b2fd4dfea
DEVICE=ens3
ONBOOT=yes
MULTI_CONNECT=1
DEVTIMEOUT=60

The assumption would be that the tcp stack would use 2a02:168:food:caab::2/64 as the primary ipv6 address (taken as outgoing address).

However when you boot a system with such configuration it is 2a02:168:f00d:caab::99/64

BUT if you restart NetworkManager it again becomes 2a02:168:food:caab::2/64

Meaning it is not consistent nor persistent from a behavior point of view. I would at least expect it to be consistent.


Version-Release number of selected component (if applicable):

# rpm -qi NetworkManager
Name        : NetworkManager
Epoch       : 1
Version     : 1.32.4
Release     : 1.el8
Architecture: x86_64
Install Date: Fri Jul 30 06:08:37 2021
Group       : System Environment/Base
Size        : 7408241
License     : GPLv2+ and LGPLv2+
Signature   : RSA/SHA256, Wed Jul 28 14:17:26 2021, Key ID 05b555b38483c65d
Source RPM  : NetworkManager-1.32.4-1.el8.src.rpm
Build Date  : Wed Jul 28 05:09:08 2021
Build Host  : x86-02.mbox.centos.org
Relocations : (not relocatable)
Packager    : CentOS Buildsys <bugs>
Vendor      : CentOS
URL         : https://networkmanager.dev/
Summary     : Network connection manager and user applications


How reproducible:

Above config

# uptime
 18:46:47 up 0 min,  1 user,  load average: 2.82, 0.67, 0.22

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:be:fe:88 brd ff:ff:ff:ff:ff:ff
    inet 10.94.0.5/24 brd 10.94.0.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet 10.94.0.6/24 brd 10.94.0.255 scope global secondary noprefixroute ens3:1
       valid_lft forever preferred_lft forever
    inet6 2a02:168:f00d:caab::99/64 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 2a02:168:f00d:caab::2/64 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:febe:fe88/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
# curl -6 icanhazip.com
2a02:168:f00d:caab::99

# systemctl restart NetworkManager
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:be:fe:88 brd ff:ff:ff:ff:ff:ff
    inet 10.94.0.5/24 brd 10.94.0.255 scope global noprefixroute ens3
       valid_lft forever preferred_lft forever
    inet 10.94.0.6/24 brd 10.94.0.255 scope global secondary noprefixroute ens3:1
       valid_lft forever preferred_lft forever
    inet6 2a02:168:f00d:caab::2/64 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 2a02:168:f00d:caab::99/64 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:febe:fe88/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
# curl -6 icanhazip.com
2a02:168:f00d:caab::2

Actual results:

Ip ordering is not predictable nor persistent

Expected results:

Predictable or at least persistent behavior.

Comment 1 Thomas Haller 2021-08-02 11:48:18 UTC
when reporting a bug against NetworkManager, please make an effort to provide level=TRACE logs.

See https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/main/contrib/fedora/rpm/NetworkManager.conf#L27 for hints about logging.

Thank you.

Comment 2 Marcel Härri 2021-08-04 09:15:19 UTC
so I enabled trace, rebooted and this are the steps I did:

# uptime
 09:04:10 up 0 min,  1 user,  load average: 1.74, 0.43, 0.14
# cat /etc/sysconfig/network-scripts/ifcfg-ens3
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=10.94.0.5
PREFIX=24
GATEWAY=10.94.0.1
DNS1=192.168.1.42
DEFROUTE=yes
DHCP_VENDOR_CLASS_IDENTIFIER=anaconda-Linux
IPV4_FAILURE_FATAL=yes
IPV6INIT=yes
IPV6_AUTOCONF=no
IPV6ADDR_SECONDARIES="2a02:168:f00d:caab::2/64"
IPV6ADDR=2a02:168:f00d:caab:0:1::2/64
IPV6_DEFAULTGW=fe80::1
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=yes
NAME=ens3
UUID=0b4f730f-6894-43bc-942d-a23b2fd4dfea
DEVICE=ens3
ONBOOT=yes
MULTI_CONNECT=1
DEVTIMEOUT=60
# curl -6 icanhazip.com
2a02:168:f00d:caab::2
# systemctl restart NetworkManager
# date
Wed Aug  4 09:04:37 UTC 2021
# curl -6 icanhazip.com
2a02:168:f00d:caab:0:1:0:2
# journalctl --boot -u NetworkManager > NetworkManager.log

As we can see first my secondary IP is the primary (which seems also wrong), but it changes once I restart NM.

attaching the log

Comment 4 Marcel Härri 2021-08-04 09:42:42 UTC
An additional datapoint: if I swap the two ip addresses, meaning making 2a02:168:f00d:caab:0:1:0:2 to IPV6ADDR, it is the chosen primary from the beginning on and it also stays the one when restarting NM.

But I would anyway expect the address in IPV6ADDR to be the primary and not the last one on IPV6ADDR_SECONDARIES (the name suggests something different). Also I would expect that the current primary one stays being the the one when restarting NetworkManager, given there gave been no config changes...

But it seems that there are 2 different ordering mechanisms in place. One at boot time and one when restarting NetworkManager.

Comment 5 Thomas Haller 2021-08-04 11:34:00 UTC
Thanks for the logs!!


The address order is wrong during the first activation.

Easy to reproduce with

  DEVICE=eth0
  nmcli connection add type ethernet con-name a autoconnect no ifname "$DEVICE" ipv4.method disabled ipv6.method manual ipv6.addresses "2a02:168:f00d:caab:0:1:0:2/64, 2a02:168:f00d:caab::2/64"
  nmcli connection up a
  ip -6 addr show dev "$DEVICE"


Curiously, after restarting, NetworkManager gets the order right. But other than that, the restart is not relevant here.
After another `nmcli connection up a` it's wrong again.

Btw, restarting NetworkManager is usually the wrong thing to do. Don't do that, unless you have good reasons (there are few good reasons).

Comment 6 Marcel Härri 2021-08-04 14:20:19 UTC
> Btw, restarting NetworkManager is usually the wrong thing to do. Don't do
> that, unless you have good reasons (there are few good reasons).

Unrelated, but for the record: NetworkManager got restarted as the rpm got updated and then another script kicked in, which restarted the service due to libs of a running process vanished/got updated and the service was detected responsible. This is when services started connecting with another ip and thus the ip change was detected. If a reboot is more appropriate in that case, this can be be adapted, but as mentioned unrelated and more as background info.

Comment 7 Thomas Haller 2021-08-04 14:24:22 UTC
(In reply to Marcel Härri from comment #6)
> > Btw, restarting NetworkManager is usually the wrong thing to do. Don't do
> > that, unless you have good reasons (there are few good reasons).
> 
> Unrelated, but for the record: NetworkManager got restarted as the rpm got
> updated and then another script kicked in, which restarted the service due
> to libs of a running process vanished/got updated and the service was
> detected responsible. This is when services started connecting with another
> ip and thus the ip change was detected. If a reboot is more appropriate in
> that case, this can be be adapted, but as mentioned unrelated and more as
> background info.

thanks for clarifying.

Package update is actually one of the few good reasons to restart NeworkManager!

Comment 8 Thomas Haller 2021-08-16 19:44:27 UTC
I am so confused by this bug...



Address selection in kernel
===========================

Kernel selects IPv6 addresses according to https://access.redhat.com/solutions/189153

In case there is a tie, it chooses the address added *last*.
Which means, it's the *first* in `ip -6 addr show` output.

(of course, for IPv4 that's entirely different).



Initscripts
===========

Initscripts comment:

 #  IPV6ADDR=<IPv6 address>[/<prefix length>]: specify primary static IPv6 address
 #  IPV6ADDR_SECONDARIES="<IPv6 address>[/<prefix length>] ..." (optional)

but then ifup-ipv6 does:

# Setup IPv6 address on specified interface
if [ -n "$IPV6ADDR" ]; then
    ipv6_add_addr_on_device $DEVICE $IPV6ADDR || exit 1
fi
...
# Setup additional IPv6 addresses from list, if given
if [ -n "$IPV6ADDR_SECONDARIES" ]; then
    for ipv6addr in $IPV6ADDR_SECONDARIES; do
        ipv6_add_addr_on_device $DEVICE $ipv6addr
    done
fi

meaning, it will add them in the order they appear, and -- see previous point -- the
address added last will be preferred.

Hence, the last of the IPV6ADDR_SECONDARIES will be used.


NetworkManager
==============

For compatibility with initscripts, it also inverses the order.
If you a list of addresses with

  nmcli -f ipv6.addresses connection show "$PROFILE"

then the *last* address should be the primary one. Which is of course unintuitive *sigh*.



Summary1
=========

In comment 0 we see:

  IPV6ADDR=2a02:168:f00d:caab::2/64
  IPV6ADDR_SECONDARIES="2a02:168:f00d:caab::99/64"

and:

2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    [...]
    inet6 2a02:168:f00d:caab::99/64 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 2a02:168:f00d:caab::2/64 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:febe:fe88/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
# curl -6 icanhazip.com
2a02:168:f00d:caab::99


so, this part works as intended, although unexpected.



What does not work, is that during a `systemctl restart NetworkManager`, the order gets messed up.
That's the bug...

Comment 9 Marcel Härri 2021-08-16 20:24:23 UTC
Yes, I am working around the behavior so far by adding the expected primary address as the last of the secondaries (which is how it used to work since forever) in the kickstart provisioning templates. Which works but is not intuitive.

BUT now when NetworkManager gets restarted it messes them up, which is why I called it inconsistent a behavior. Changing the behavior of the initscripts to the intuitive one (that IPV6ADDR is the primary one) would probably be way too much of a change for RHEL 8 (and 9)...

Comment 14 errata-xmlrpc 2021-11-09 19:30:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: NetworkManager security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4361