Bug 1901517
Summary: | RHCOS 4.6.1 uses a single NetworkManager connection for multiple NICs when using default DHCP | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Sebastian Jug <sejug> | |
Component: | RHCOS | Assignee: | Dusty Mabe <dustymabe> | |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | |
Severity: | low | Docs Contact: | ||
Priority: | low | |||
Version: | 4.6 | CC: | bbreard, bfuru, dustymabe, imcleod, jlebon, jligon, miabbott, nstielau | |
Target Milestone: | --- | |||
Target Release: | 4.7.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: By default when using DHCP, a single NetworkManager connection was created which matched all interfaces.
Consequence: This lead to a confusing user experience when querying and modifying connection settings via NetworkManager.
Fix: By default when using DHCP, we now let NetworkManager create a separate connection for each interface.
Result: The user experience when querying and modifying connection settings via NetworkManager is less confusing.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1954039 (view as bug list) | Environment: | ||
Last Closed: | 2021-02-24 15:35:50 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1954039 |
Description
Sebastian Jug
2020-11-25 13:48:50 UTC
Here's what I think is going on here: - in the initramfs, NetworkManager generates connections based on network kargs - the default network kargs are ip=dhcp,dhcp6 - based on those kargs, NM generates a single connection which matches all devices: $ /usr/libexec/nm-initrd-generator -s -- rd.neednet=1 ip=dhcp,dhcp6 *** Connection 'default_connection' *** [connection] id=Wired Connection uuid=b6e839f1-3bb2-43f8-8c73-7122a7842a02 type=ethernet multi-connect=3 permissions= [ethernet] mac-address-blacklist= [ipv4] dns-search= method=auto [ipv6] addr-gen-mode=eui64 dns-search= method=auto - at the end of the initrd, coreos-teardown-network.service propagates any generated NM connections into the real root if none were provided via Ignition - in the real root, NM finds the connection and since it matches all devices, it doesn't try to generate separate connections per device If you specify kargs which describe different setups for each device, you get the expected behaviour, e.g. $ /usr/libexec/nm-initrd-generator -s -- ip=10.10.10.10::10.10.10.1:255.255.255.0:myhost:enp1s0:none:8.8.8.8 ip=10.10.10.11::10.10.10.1:255.255.255.0:myhost:enp2s0:none:8.8.8.8 *** Connection 'enp1s0' *** [connection] id=enp1s0 uuid=e24371cd-6b3d-4672-9049-0e59dfe62262 type=ethernet interface-name=enp1s0 multi-connect=1 permissions= [ethernet] mac-address-blacklist= [ipv4] address1=10.10.10.10/24,10.10.10.1 dhcp-hostname=myhost dns=8.8.8.8; dns-search= may-fail=false method=manual [ipv6] addr-gen-mode=eui64 dhcp-hostname=myhost dns-search= method=disabled [proxy] *** Connection 'enp2s0' *** [connection] id=enp2s0 uuid=afac5eb7-0eb3-46fc-920a-1c3fe771edac type=ethernet interface-name=enp2s0 multi-connect=1 permissions= [ethernet] mac-address-blacklist= [ipv4] address1=10.10.10.11/24,10.10.10.1 dhcp-hostname=myhost dns=8.8.8.8; dns-search= may-fail=false method=manual [ipv6] addr-gen-mode=eui64 dhcp-hostname=myhost dns-search= method=disabled [proxy] The confusing part I think is that the NM behaviour is different between the initrd and the real root. By default, NM in the real root will also DHCP on all interfaces, but it will generate a separate connection per device. Now, technically everything still works fine. As far as NM knows, you do want DHCP on all interfaces, and it has no problem using the same connection to apply to multiple devices. But it makes for a deteriorated UX because now you can't e.g. bring down a single device or modify it without first cloning the connection so there is one per device. I think the cleanest way to fix this is for nm-initrd-generator to mimic the behaviour of real root NM and generate a separate connection per device. Will file an upstream issue to see what the NM folks think about this. (To be clear though, users should be able to specify custom settings per device either via kargs or NM connections in Ignition and NM will respect that -- this is more about what the default behaviour should be.) Hey Sebastian, This is just a single connection profile that matches multiple NICs, which is valid. What problem are you seeing because of this behavior? Hey Dusty, Depends what you mean by valid I guess. Two NICs using DHCP is valid, yes. I don't think that a single configuration for both NICs is valid. At what point do you want to configure multiple networks at the same time with one configuration, essentially when would someone want this? If you have multiple NICs either you want to do some bonding or redundancy or they're different networks and you want to connect to them in different ways (as I did). This default behavior happens both on the live iso as well as the node after `coreos-install` if the `--copy-network` flag isn't specified, giving the node after installation immediately the wrong default network configuration. In my config I had two NICs on two different networks, one on a publicly accessible network, one on a private NIC that hosted the physical cluster network. Even though the cluster network was entirely on the private NIC the node IP of the host once it was added was the public NIC which caused issues as you can imagine.. AFAICT it was due to this issue. This bug IMO is simply about what the default behaviour of NM should be in the initrd, and having it match what it does in the initrd and the real root (that's why I filed https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/592). In the installed system, this UX issue shouldn't affect users in any real sense because any wanted configuration should've already been provided up front. In the live ISO, it does make configuring networking via e.g. nmcli/nmtui slightly more awkward because you first have to delete the old NM connection which matched both devices because being able to customize them. Hey Jonathan, Could we just remove the extra step where the user has to delete the initial NM connection, and immediately start with the "expected" of having one connection per NIC as usual? (In reply to Sebastian Jug from comment #3) > In my config I had two NICs on two different networks, one on a publicly > accessible network, one on a private NIC that hosted the physical cluster > network. Even though the cluster network was entirely on the private NIC the > node IP of the host once it was added was the public NIC which caused issues > as you can imagine.. AFAICT it was due to this issue. Can you elaborate on this? It's not clear to me how this relates to the NM connection UX issue. Is the kubelet somehow sensitive to the difference between one NM connection describing DHCP on two devices vs two separate NM connections? The end state of the interfaces is the same regardless, right? So the only way the kubelet could tell the difference is if it's actually talking to NM/looking at NM configs, which would surprise me. Does that kubelet bug go away when you configure the interfaces using two separate connections, each with DHCP (without changing any other configuration settings)? Hey Jonathan, > Is the kubelet somehow sensitive to the difference between one NM connection describing DHCP on two devices vs two separate NM connections? Not quite, no. That's quite the elaborate hypothetical scenario there. This same issue happens on both: the RHCOS live iso, as well as the node upon installation. In order to get my network configuration to propagate to installation: 1) Delete the single catch all NM connection. 2) After deletion the default behavior of one connection per NIC occurs, allowing me/the user to configure each NIC individually. 3) Running coreos-installer with --copy-network workaround. (https://bugzilla.redhat.com/show_bug.cgi?id=1895979) So the problematic instance of this bug was more so on the default installation network rather than the live ISO one, but it occurs on both AFAICT. How does NM determine which is the primary NIC in this single configuration mode? Is it different than having two configs? As I've asked in my previous post, what does a singular NM configuration add over the typical 1:1 (nic:config) add? (In reply to Sebastian Jug from comment #7) > So the problematic instance of this bug was more so on the default > installation network rather than the live ISO one, but it occurs on both > AFAICT. Gotcha, thanks for clarifying. So this still comes down to a bad/confusing UX. > How does NM determine which is the primary NIC in this single configuration > mode? Is it different than having two configs? I'm not completely sure, but I don't think there even is a "primary" NIC in this situation. They're just two separate NICs both configured for DHCP. > As I've asked in my previous post, what does a singular NM configuration add > over the typical 1:1 (nic:config) add? It's not actually intended behaviour, it's just a subtle side-effect of the way our code interacts with NM (see discussions in https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/592). While it's valid, I agree with you it makes for a bad experience. :) Dusty had an idea in https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/592#note_711889 which is promising. We'll see if we can give that a try and report back. Targeting for 4.7 with low priority as this appears to be just bad UX with a single reported example of this; if we get additional reports we can adjust priority Re-assigning to Dusty, who's going to try the approach in https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/592#note_711889. This bug has not been selected for work in the current sprint. Downstream PR: https://github.com/openshift/os/pull/467 The changes landed in RHCOS 47.83.202012140447-0; moving to MODIFIED Default case: Initramfs -------------------- switch_root:/# ls /run/NetworkManager/system-connections/default_connection.nmcon /run/NetworkManager/system-connections/default_connection.nmconnection switch_root:/# ls /run/NetworkManager/system-connections/ default_connection.nmconnection onnection t:/# cat /run/NetworkManager/system-connections/default_connection.nmco [connection] id=Wired Connection uuid=43cd707e-1136-4c10-ad9d-c0cf03c5e671 type=ethernet multi-connect=3 permissions= wait-device-timeout=60000 [ethernet] mac-address-blacklist= [ipv4] dns-search= method=auto [ipv6] addr-gen-mode=eui64 dns-search= method=auto [proxy] Real root ------------------------------- [core@localhost ~]$ journalctl -t coreos-teardown-initramfs | grep 'info: skipping propagation of default networking configs' Jan 06 16:43:52 localhost coreos-teardown-initramfs[1072]: info: skipping propagation of default networking configs Jan 06 16:43:54 localhost coreos-teardown-initramfs[1072]: info: skipping propagation of default networking configs [core@localhost ~]$ ls -A /etc/NetworkManager/system-connections/ [core@localhost ~]$ [core@localhost ~]$ sudo nmcli con show NAME UUID TYPE DEVICE Wired connection 1 b7f0dc3e-ea30-33d3-aeff-1beb2475b556 ethernet enp1s0 Wired connection 2 96f1bf61-3c27-3cce-b49f-2191e12e6ba2 ethernet enp2s0 [core@localhost ~]$ rpm-ostree status State: idle Deployments: * ostree://f19ba4d9abfde5fbc8e957e06e28b0150412d0f6b20d327e015434521f36924e Version: 47.83.202101060443-0 (2021-01-06T04:46:21Z) virt-install -n twonics --vcpus 2 -r 2048 --os-type=rhel8.0 --network network=default --network network=default2 --cdrom=/rhcos-47.83.202101060443-0-live.x86_64.iso --disk size=20 --nographics append kargs `console=tty console=ttyS0 rd.break rd.neednet=1 ip=enp1s0:dhcp ip=enp2s0:dhcp` and boot initrd ------------------------- switch_root:/# cat /run/NetworkManager/ devices/ initrd/ internal-9129c3f4-462d-4a67-9fb4-5e3b80a592eb-enp2s0.lease internal-c75b9586-4f02-4760-8229-3e7afdbed4eb-enp1s0.lease no-stub-resolv.conf resolv.conf system-connections/ switch_root:/# cat /run/NetworkManager/system-connections/ cat: /run/NetworkManager/system-connections/: Is a directory switch_root:/# ls /run/NetworkManager/system-connections/ enp1s0.nmconnection enp2s0.nmconnection switch_root:/# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:94:33:4b brd ff:ff:ff:ff:ff:ff inet 192.168.122.17/24 brd 192.168.122.255 scope global dynamic noprefixroute enp1s0 valid_lft 3575sec preferred_lft 3575sec inet6 fe80::5054:ff:fe94:334b/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:a9:1b:0d brd ff:ff:ff:ff:ff:ff inet 192.168.129.178/24 brd 192.168.129.255 scope global dynamic noprefixroute enp2s0 valid_lft 3576sec preferred_lft 3576sec inet6 fe80::5054:ff:fea9:1b0d/64 scope link noprefixroute valid_lft forever preferred_lft forever switch_root:/# cd /run/NetworkManager/system-connections/ switch_root:/run/NetworkManager/system-connections# cat enp1s0.nmconnection [connection] id=enp1s0 uuid=c75b9586-4f02-4760-8229-3e7afdbed4eb type=ethernet interface-name=enp1s0 multi-connect=1 permissions= wait-device-timeout=60000 [ethernet] mac-address-blacklist= [ipv4] dns-search= may-fail=false method=auto [ipv6] addr-gen-mode=eui64 dns-search= method=auto [proxy] switch_root:/run/NetworkManager/system-connections# cat enp2s0.nmconnection [connection] id=enp2s0 uuid=9129c3f4-462d-4a67-9fb4-5e3b80a592eb type=ethernet interface-name=enp2s0 multi-connect=1 permissions= wait-device-timeout=60000 [ethernet] mac-address-blacklist= [ipv4] dns-search= may-fail=false method=auto [ipv6] addr-gen-mode=eui64 dns-search= method=auto [proxy] real root ------------------------------ [core@localhost ~]$ journalctl -t coreos-teardown-initramfs | grep 'info: propagating initramfs networking config to the real root' Jan 06 17:20:29 localhost coreos-teardown-initramfs[1065]: info: propagating initramfs networking config to the real root [core@localhost ~]$ nmcli con show NAME UUID TYPE DEVICE enp1s0 c75b9586-4f02-4760-8229-3e7afdbed4eb ethernet enp1s0 enp2s0 9129c3f4-462d-4a67-9fb4-5e3b80a592eb ethernet enp2s0 [core@localhost ~]$ cd /run/NetworkManager/ [core@localhost NetworkManager]$ ls devices no-stub-resolv.conf resolv.conf [core@localhost NetworkManager]$ cd /etc/NetworkManager/system-connections/ [core@localhost system-connections]$ ls enp1s0.nmconnection enp2s0.nmconnection [core@localhost system-connections]$ sudo cat enp1s0.nmconnection [connection] id=enp1s0 uuid=c75b9586-4f02-4760-8229-3e7afdbed4eb type=ethernet interface-name=enp1s0 multi-connect=1 permissions= wait-device-timeout=60000 [ethernet] mac-address-blacklist= [ipv4] dns-search= may-fail=false method=auto [ipv6] addr-gen-mode=eui64 dns-search= method=auto [proxy] [core@localhost system-connections]$ sudo cat enp2s0.nmconnection [connection] id=enp2s0 uuid=9129c3f4-462d-4a67-9fb4-5e3b80a592eb type=ethernet interface-name=enp2s0 multi-connect=1 permissions= wait-device-timeout=60000 [ethernet] mac-address-blacklist= [ipv4] dns-search= may-fail=false method=auto [ipv6] addr-gen-mode=eui64 dns-search= method=auto [proxy] Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |