Bug 1901517 - RHCOS 4.6.1 uses a single NetworkManager connection for multiple NICs when using default DHCP
Summary: RHCOS 4.6.1 uses a single NetworkManager connection for multiple NICs when us...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: 4.7.0
Assignee: Dusty Mabe
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1954039
TreeView+ depends on / blocked
 
Reported: 2020-11-25 13:48 UTC by Sebastian Jug
Modified: 2021-04-27 13:44 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: By default when using DHCP, a single NetworkManager connection was created which matched all interfaces. Consequence: This lead to a confusing user experience when querying and modifying connection settings via NetworkManager. Fix: By default when using DHCP, we now let NetworkManager create a separate connection for each interface. Result: The user experience when querying and modifying connection settings via NetworkManager is less confusing.
Clone Of:
: 1954039 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:35:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github coreos fedora-coreos-config pull 773 0 None closed coreos-teardown-initramfs: don't propagate purely default network configs 2021-01-26 19:46:12 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:36:19 UTC
freedesktop.org Gitlab NetworkManager NetworkManager issues 592 0 None None None 2020-11-25 16:13:05 UTC

Description Sebastian Jug 2020-11-25 13:48:50 UTC
Description of problem:
When booting with the new live RHCOS installer ISO the generated NetworkManager configuration is invalid when there are multiple NICs that are connected to the system.


Version-Release number of selected component (if applicable):
rhcos-4.6.1-x86_64-live.x86_64.iso

How reproducible:
Every boot

Steps to Reproduce:
1. Two NICs connected to different networks (eno1 & eno2) both with DHCP
2. Boot the RHCOS 4.6.1 live iso
3. Check NetworkManager configuration `sudo nmcli con show`
4. Notice that there is one "Wired Configuration" that shares one UUID across two devices.

Actual results:
$ sudo nmcli con show
NAME                UUID                                  TYPE       DEVICE
Wired Connection    989068c2-3ab2-45c4-9e5f-46a4c31590bb  ethernet   eno2
Wired Connection    989068c2-3ab2-45c4-9e5f-46a4c31590bb  ethernet   eno1

Expected results:
$ sudo nmcli con show
NAME                UUID                                  TYPE       DEVICE
Wired connection 1  2f8fcc2c-b98b-3217-8461-d56aa5d0674c  ethernet   eno2
Wired connection 2  31b03d97-2fb9-3385-9a9d-319df3ca8166  ethernet   eno1


Additional info:
If the user manually deletes the "Wired Connection" aka the initial configuration, then the expected results (as shown) are automatically generated.

Comment 1 Jonathan Lebon 2020-11-25 16:08:44 UTC
Here's what I think is going on here:
- in the initramfs, NetworkManager generates connections based on network kargs
- the default network kargs are ip=dhcp,dhcp6
- based on those kargs, NM generates a single connection which matches all devices:

$ /usr/libexec/nm-initrd-generator -s -- rd.neednet=1 ip=dhcp,dhcp6

*** Connection 'default_connection' ***

[connection]
id=Wired Connection
uuid=b6e839f1-3bb2-43f8-8c73-7122a7842a02
type=ethernet
multi-connect=3
permissions=

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

- at the end of the initrd, coreos-teardown-network.service propagates any generated NM connections into the real root if none were provided via Ignition
- in the real root, NM finds the connection and since it matches all devices, it doesn't try to generate separate connections per device

If you specify kargs which describe different setups for each device, you get the expected behaviour, e.g.

$ /usr/libexec/nm-initrd-generator -s -- ip=10.10.10.10::10.10.10.1:255.255.255.0:myhost:enp1s0:none:8.8.8.8 ip=10.10.10.11::10.10.10.1:255.255.255.0:myhost:enp2s0:none:8.8.8.8


*** Connection 'enp1s0' ***

[connection]
id=enp1s0
uuid=e24371cd-6b3d-4672-9049-0e59dfe62262
type=ethernet
interface-name=enp1s0
multi-connect=1
permissions=

[ethernet]
mac-address-blacklist=

[ipv4]
address1=10.10.10.10/24,10.10.10.1
dhcp-hostname=myhost
dns=8.8.8.8;
dns-search=
may-fail=false
method=manual

[ipv6]
addr-gen-mode=eui64
dhcp-hostname=myhost
dns-search=
method=disabled

[proxy]

*** Connection 'enp2s0' ***

[connection]
id=enp2s0
uuid=afac5eb7-0eb3-46fc-920a-1c3fe771edac
type=ethernet
interface-name=enp2s0
multi-connect=1
permissions=

[ethernet]
mac-address-blacklist=

[ipv4]
address1=10.10.10.11/24,10.10.10.1
dhcp-hostname=myhost
dns=8.8.8.8;
dns-search=
may-fail=false
method=manual

[ipv6]
addr-gen-mode=eui64
dhcp-hostname=myhost
dns-search=
method=disabled

[proxy]

The confusing part I think is that the NM behaviour is different between the initrd and the real root. By default, NM in the real root will also DHCP on all interfaces, but it will generate a separate connection per device.

Now, technically everything still works fine. As far as NM knows, you do want DHCP on all interfaces, and it has no problem using the same connection to apply to multiple devices. But it makes for a deteriorated UX because now you can't e.g. bring down a single device or modify it without first cloning the connection so there is one per device.

I think the cleanest way to fix this is for nm-initrd-generator to mimic the behaviour of real root NM and generate a separate connection per device. Will file an upstream issue to see what the NM folks think about this.

(To be clear though, users should be able to specify custom settings per device either via kargs or NM connections in Ignition and NM will respect that -- this is more about what the default behaviour should be.)

Comment 2 Dusty Mabe 2020-11-30 04:58:55 UTC
Hey Sebastian,

This is just a single connection profile that matches multiple NICs, which is valid. What problem are you seeing because of this behavior?

Comment 3 Sebastian Jug 2020-11-30 13:01:40 UTC
Hey Dusty,

Depends what you mean by valid I guess. Two NICs using DHCP is valid, yes. I don't think that a single configuration for both NICs is valid. At what point do you want to configure multiple networks at the same time with one configuration, essentially when would someone want this? If you have multiple NICs either you want to do some bonding or redundancy or they're different networks and you want to connect to them in different ways (as I did).

This default behavior happens both on the live iso as well as the node after `coreos-install` if the `--copy-network` flag isn't specified, giving the node after installation immediately the wrong default network configuration.

In my config I had two NICs on two different networks, one on a publicly accessible network, one on a private NIC that hosted the physical cluster network. Even though the cluster network was entirely on the private NIC the node IP of the host once it was added was the public NIC which caused issues as you can imagine.. AFAICT it was due to this issue.

Comment 4 Jonathan Lebon 2020-11-30 14:32:31 UTC
This bug IMO is simply about what the default behaviour of NM should be in the initrd, and having it match what it does in the initrd and the real root (that's why I filed https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/592). In the installed system, this UX issue shouldn't affect users in any real sense because any wanted configuration should've already been provided up front. In the live ISO, it does make configuring networking via e.g. nmcli/nmtui slightly more awkward because you first have to delete the old NM connection which matched both devices because being able to customize them.

Comment 5 Sebastian Jug 2020-11-30 14:36:14 UTC
Hey Jonathan,

Could we just remove the extra step where the user has to delete the initial NM connection, and immediately start with the "expected" of having one connection per NIC as usual?

Comment 6 Jonathan Lebon 2020-11-30 15:38:37 UTC
(In reply to Sebastian Jug from comment #3)
> In my config I had two NICs on two different networks, one on a publicly
> accessible network, one on a private NIC that hosted the physical cluster
> network. Even though the cluster network was entirely on the private NIC the
> node IP of the host once it was added was the public NIC which caused issues
> as you can imagine.. AFAICT it was due to this issue.

Can you elaborate on this? It's not clear to me how this relates to the NM connection UX issue. Is the kubelet somehow sensitive to the difference between one NM connection describing DHCP on two devices vs two separate NM connections? The end state of the interfaces is the same regardless, right? So the only way the kubelet could tell the difference is if it's actually talking to NM/looking at NM configs, which would surprise me.

Does that kubelet bug go away when you configure the interfaces using two separate connections, each with DHCP (without changing any other configuration settings)?

Comment 7 Sebastian Jug 2020-12-01 20:07:25 UTC
Hey Jonathan,

> Is the kubelet somehow sensitive to the difference between one NM connection describing DHCP on two devices vs two separate NM connections?

Not quite, no. That's quite the elaborate hypothetical scenario there.
This same issue happens on both: the RHCOS live iso, as well as the node upon installation.

In order to get my network configuration to propagate to installation:

1) Delete the single catch all NM connection.
2) After deletion the default behavior of one connection per NIC occurs, allowing me/the user to configure each NIC individually.
3) Running coreos-installer with --copy-network workaround. (https://bugzilla.redhat.com/show_bug.cgi?id=1895979)

So the problematic instance of this bug was more so on the default installation network rather than the live ISO one, but it occurs on both AFAICT.
How does NM determine which is the primary NIC in this single configuration mode? Is it different than having two configs?

As I've asked in my previous post, what does a singular NM configuration add over the typical 1:1 (nic:config) add?

Comment 8 Jonathan Lebon 2020-12-01 20:47:24 UTC
(In reply to Sebastian Jug from comment #7)
> So the problematic instance of this bug was more so on the default
> installation network rather than the live ISO one, but it occurs on both
> AFAICT.

Gotcha, thanks for clarifying. So this still comes down to a bad/confusing UX.

> How does NM determine which is the primary NIC in this single configuration
> mode? Is it different than having two configs?

I'm not completely sure, but I don't think there even is a "primary" NIC in this situation. They're just two separate NICs both configured for DHCP.

> As I've asked in my previous post, what does a singular NM configuration add
> over the typical 1:1 (nic:config) add?

It's not actually intended behaviour, it's just a subtle side-effect of the way our code interacts with NM (see discussions in https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/592). While it's valid, I agree with you it makes for a bad experience. :)

Dusty had an idea in https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/592#note_711889 which is promising. We'll see if we can give that a try and report back.

Comment 9 Micah Abbott 2020-12-01 21:23:53 UTC
Targeting for 4.7 with low priority as this appears to be just bad UX with a single reported example of this; if we get additional reports we can adjust priority

Comment 10 Jonathan Lebon 2020-12-01 21:27:45 UTC
Re-assigning to Dusty, who's going to try the approach in https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/592#note_711889.

Comment 11 Dusty Mabe 2020-12-04 22:44:21 UTC
This bug has not been selected for work in the current sprint.

Comment 12 Dusty Mabe 2020-12-11 04:05:07 UTC
Upstream PR: https://github.com/coreos/fedora-coreos-config/pull/773

Comment 13 Dusty Mabe 2020-12-11 21:21:04 UTC
Downstream PR: https://github.com/openshift/os/pull/467

Comment 15 Micah Abbott 2020-12-14 14:52:00 UTC
The changes landed in RHCOS 47.83.202012140447-0; moving to MODIFIED

Comment 22 Michael Nguyen 2021-01-06 16:48:36 UTC
Default case:

Initramfs
--------------------

switch_root:/# ls /run/NetworkManager/system-connections/default_connection.nmcon
/run/NetworkManager/system-connections/default_connection.nmconnection
switch_root:/# ls /run/NetworkManager/system-connections/
default_connection.nmconnection
onnection t:/# cat /run/NetworkManager/system-connections/default_connection.nmco
[connection]
id=Wired Connection
uuid=43cd707e-1136-4c10-ad9d-c0cf03c5e671
type=ethernet
multi-connect=3
permissions=
wait-device-timeout=60000

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]






Real root
-------------------------------
[core@localhost ~]$ journalctl -t coreos-teardown-initramfs | grep 'info: skipping propagation of default networking configs'
Jan 06 16:43:52 localhost coreos-teardown-initramfs[1072]: info: skipping propagation of default networking configs
Jan 06 16:43:54 localhost coreos-teardown-initramfs[1072]: info: skipping propagation of default networking configs
[core@localhost ~]$ ls -A /etc/NetworkManager/system-connections/
[core@localhost ~]$
[core@localhost ~]$ sudo nmcli con show
NAME                UUID                                  TYPE      DEVICE 
Wired connection 1  b7f0dc3e-ea30-33d3-aeff-1beb2475b556  ethernet  enp1s0 
Wired connection 2  96f1bf61-3c27-3cce-b49f-2191e12e6ba2  ethernet  enp2s0 


[core@localhost ~]$ rpm-ostree status
State: idle
Deployments:
* ostree://f19ba4d9abfde5fbc8e957e06e28b0150412d0f6b20d327e015434521f36924e
                   Version: 47.83.202101060443-0 (2021-01-06T04:46:21Z)

Comment 23 Michael Nguyen 2021-01-06 17:25:49 UTC
virt-install -n twonics --vcpus 2 -r 2048 --os-type=rhel8.0  --network network=default --network network=default2 --cdrom=/rhcos-47.83.202101060443-0-live.x86_64.iso --disk size=20 --nographics

append kargs `console=tty console=ttyS0 rd.break rd.neednet=1 ip=enp1s0:dhcp ip=enp2s0:dhcp` and boot

initrd
-------------------------
switch_root:/# cat /run/NetworkManager/
devices/
initrd/
internal-9129c3f4-462d-4a67-9fb4-5e3b80a592eb-enp2s0.lease
internal-c75b9586-4f02-4760-8229-3e7afdbed4eb-enp1s0.lease
no-stub-resolv.conf
resolv.conf
system-connections/
switch_root:/# cat /run/NetworkManager/system-connections/
cat: /run/NetworkManager/system-connections/: Is a directory
switch_root:/# ls /run/NetworkManager/system-connections/ 
enp1s0.nmconnection  enp2s0.nmconnection
switch_root:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:94:33:4b brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.17/24 brd 192.168.122.255 scope global dynamic noprefixroute enp1s0
       valid_lft 3575sec preferred_lft 3575sec
    inet6 fe80::5054:ff:fe94:334b/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:a9:1b:0d brd ff:ff:ff:ff:ff:ff
    inet 192.168.129.178/24 brd 192.168.129.255 scope global dynamic noprefixroute enp2s0
       valid_lft 3576sec preferred_lft 3576sec
    inet6 fe80::5054:ff:fea9:1b0d/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
switch_root:/# cd /run/NetworkManager/system-connections/
switch_root:/run/NetworkManager/system-connections# cat enp1s0.nmconnection 
[connection]
id=enp1s0
uuid=c75b9586-4f02-4760-8229-3e7afdbed4eb
type=ethernet
interface-name=enp1s0
multi-connect=1
permissions=
wait-device-timeout=60000

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
may-fail=false
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]
switch_root:/run/NetworkManager/system-connections# cat enp2s0.nmconnection 
[connection]
id=enp2s0
uuid=9129c3f4-462d-4a67-9fb4-5e3b80a592eb
type=ethernet
interface-name=enp2s0
multi-connect=1
permissions=
wait-device-timeout=60000

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
may-fail=false
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]



real root
------------------------------
[core@localhost ~]$ journalctl -t coreos-teardown-initramfs | grep 'info: propagating initramfs networking config to the real root'
Jan 06 17:20:29 localhost coreos-teardown-initramfs[1065]: info: propagating initramfs networking config to the real root

[core@localhost ~]$ nmcli con show
NAME    UUID                                  TYPE      DEVICE 
enp1s0  c75b9586-4f02-4760-8229-3e7afdbed4eb  ethernet  enp1s0 
enp2s0  9129c3f4-462d-4a67-9fb4-5e3b80a592eb  ethernet  enp2s0 
[core@localhost ~]$ cd /run/NetworkManager/       
[core@localhost NetworkManager]$ ls
devices  no-stub-resolv.conf  resolv.conf
[core@localhost NetworkManager]$ cd /etc/NetworkManager/system-connections/
[core@localhost system-connections]$ ls
enp1s0.nmconnection  enp2s0.nmconnection
[core@localhost system-connections]$ sudo cat enp1s0.nmconnection 
[connection]
id=enp1s0
uuid=c75b9586-4f02-4760-8229-3e7afdbed4eb
type=ethernet
interface-name=enp1s0
multi-connect=1
permissions=
wait-device-timeout=60000

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
may-fail=false
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]
[core@localhost system-connections]$ sudo cat enp2s0.nmconnection 
[connection]
id=enp2s0
uuid=9129c3f4-462d-4a67-9fb4-5e3b80a592eb
type=ethernet
interface-name=enp2s0
multi-connect=1
permissions=
wait-device-timeout=60000

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
may-fail=false
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]

Comment 26 errata-xmlrpc 2021-02-24 15:35:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.