Description of problem: File /etc/NetworkManager/system-connections/default_connection.nmconnection is incompatible with SR-IOV operator I assume that /etc/NetworkManager/system-connections/default_connection.nmconnection is pushed by the IPI installer. Contents: ~~~ [connection] id=Wired Connection uuid=5f123cb7-3abb-477a-8c29-55fa809f19b4 type=ethernet multi-connect=3 permissions= [ethernet] mac-address-blacklist= [ipv4] dns-search= method=auto [ipv6] addr-gen-mode=eui64 dns-search= method=auto [proxy] ~~~ The problem with this configuration is that we are blindly assuming that any interface on the host wants to run DHCP. But that's not the case and creates issues with the SR-IOV operator, but I'd guess that it causes issues with many other things. In the example of the SR-IOV operator, if a VF is bound to the kernel driver, then assigned to a pod, and then unbound from the pod, NetworkManager will manage the VF and run DHCP on that VF. If the VF belongs to the machine network, we will now have the same subnet 2x on the worker node: once on the PF, and another time on the VF. I do not believe that this is an issue with the SR-IOV operator or device plugin. Instead, I think that the approach of telling NetworkManager to get DHCP leases on all interfaces is wrong for any day 2 operation. Granted, this approach makes sense for the installation stage. But once the node was provisioned, we should be able to figure out which interface is the machine network interface. And the default_connection.nmconnection should be removed and replaced with more specific configuration. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
LATEST COMMENT FROM CUSTOMER - JOHN WONG: Alright. .so I've ran a subset of your commands that I think applied to this comment.. Before reboot or deleting that /etc/NetworkManager/system-connections/default_connection.nmconnection file. I think there weren't as many from what you got from you analysis because ~22 hours has passed. [core@worker-0 ~]$ sudo nmcli con show --active NAME UUID TYPE DEVICE Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f0v12 <--- I think this is the problem VF. ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet eno2np1 br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0 ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0 [core@worker-0 ~]$ ip -d address | less | grep -i 0v 53: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 54: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 55: ens1f0v10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 56: ens1f0v11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 57: ens1f0v12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 inet 10.144.175.175/24 brd 10.144.175.255 scope global dynamic noprefixroute ens1f0v12 58: ens1f0v13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 59: ens1f0v14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 60: ens1f0v15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 61: ens1f0v16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 62: ens1f0v17: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 63: ens1f0v18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 64: ens1f0v19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 65: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 66: ens1f0v20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 67: ens1f0v21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 68: ens1f0v22: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 69: ens1f0v23: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 70: ens1f0v24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 71: ens1f0v25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 72: ens1f0v26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 73: ens1f0v27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 74: ens1f0v28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 75: ens1f0v29: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 76: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 77: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 78: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 79: ens1f0v6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 80: ens1f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 81: ens1f0v8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 82: ens1f0v9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 I deleted /etc/NetworkManager/system-connections/default_connection.nmconnection then reboot. After waiting for about 15 minutes, it eventually got back in the same state. [core@worker-0 ~]$ sudo nmcli con show --active NAME UUID TYPE DEVICE Wired connection 3 e3c41ffa-b95d-33a9-8c01-581070bbbac1 ethernet ens1f1 Wired connection 4 64eb35e9-4974-3925-baee-d8984a8d6d3a ethernet ens7f0 Wired connection 5 548649c0-0828-3bed-9a78-a9d7a6591ade ethernet ens7f1 Wired connection 6 b4a780af-c1ff-3e6d-9a98-8a0f688473c8 ethernet ens1f0v9 <----- Same issue but new VF and different UUID ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex Wired connection 2 b04ab6fd-e38c-3d96-8dd2-62d56aa82e37 ethernet eno2np1 br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0 ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0 [core@worker-0 ~]$ ip -d address | grep -i 0v 53: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 54: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 55: ens1f0v10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 56: ens1f0v11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 57: ens1f0v12: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 58: ens1f0v13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 59: ens1f0v14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 60: ens1f0v15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 61: ens1f0v16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 62: ens1f0v17: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 63: ens1f0v18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 64: ens1f0v19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 65: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 66: ens1f0v20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 67: ens1f0v21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 68: ens1f0v22: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 69: ens1f0v23: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 70: ens1f0v24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 71: ens1f0v25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 72: ens1f0v26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 73: ens1f0v27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 74: ens1f0v28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 75: ens1f0v29: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 76: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 77: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 78: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 79: ens1f0v6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 80: ens1f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 81: ens1f0v8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 82: ens1f0v9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 inet 10.144.175.176/24 brd 10.144.175.255 scope global dynamic noprefixroute ens1f0v9 Restored /etc/NetworkManager/system-connections/default_connection.nmconnection and made the following changed in your previous comment. (...) [ipv4] dns-search= method=disabled [ipv6] addr-gen-mode=eui64 dns-search= method=disabled (...) Then rebooted [core@worker-0 ~]$ sudo nmcli con show --active NAME UUID TYPE DEVICE ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0 ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet eno2np1 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f1 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f0 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f0v6 <--- Still seeing this but I don't see an IP on the VF (maybe I got lucky on this lottery?) Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f1 [core@worker-0 ~]$ ip -d address | grep -i 0v 54: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 55: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 56: ens1f0v10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 57: ens1f0v11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 58: ens1f0v12: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 59: ens1f0v13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 60: ens1f0v14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 61: ens1f0v15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 62: ens1f0v16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 63: ens1f0v17: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 64: ens1f0v18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 65: ens1f0v19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 66: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 67: ens1f0v20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 68: ens1f0v21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 69: ens1f0v22: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 70: ens1f0v23: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 71: ens1f0v24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 72: ens1f0v25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 73: ens1f0v26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 74: ens1f0v27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 75: ens1f0v28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 76: ens1f0v29: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 77: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 78: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 79: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 80: ens1f0v6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 81: ens1f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 82: ens1f0v8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 83: ens1f0v9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 Something I did noticed is that I wasn't able to ssh into worker-0 on another interface (this interface was used during install but I've kept it there for situations like this where I lost connectivity on the baremetal network [kni@provisioner ~]$ ssh core.0.29 ssh: connect to host 172.22.0.29 port 22: No route to host
One approach that could be leveraged would be to create though a machine config this network manager conf (say in /etc/NetworkManager/conf.d/99-nodefault.conf ): [main] no-auto-default=* Which as per Network manager documentation does the following: Specify devices for which NetworkManager shouldn't create default wired connection (Auto eth0). By default, NetworkManager creates a temporary wired connection for any Ethernet device that is managed and doesn't have a connection configured. List a device in this option to inhibit creating the default connection for the device. May have the special value * to apply to all devices. When the default wired connection is deleted or saved to a new persistent connection by a plugin, the device is added to a list in the file /var/lib/NetworkManager/no-auto-default.state to prevent creating the default connection for that device again.
We tried 3 options to work around this for the time being: i) Deleting /etc/NetworkManager/system-connections/default_connection.nmconnection This option does not work. The VFs will get an IP address lease from the DHCP server after they switch from the pods' to the host's namespace ii) Set method=disabled in all sections in /etc/NetworkManager/system-connections/default_connection.nmconnection This option **does** work as a workaround. When VFs switch from the pods' to the host's namespace, they do not obtain a DHCP lease. iii) udev workaround with with method=auto in /etc/NetworkManager/system-connections/default_connection.nmconnection ~~~ cat <<'EOF' > /etc/udev/rules.d/99-vfs-unmanaged.rules ENV{PCI_ID}=="8086:10ED", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by PCI ID ENV{ID_NET_DRIVER}=="ixgbevf", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by driver name ENV{PCI_ID}=="15B3:1018", ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID ENV{ID_VENDOR_ID}==0x15b3, ENV{ID_MODEL_ID}==0x1018, ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID EOF udevadm control --reload-rules && udevadm trigger ~~~ Does not work at the moment. Even though we see the correct NM_UNMANAGED annotation: ~~~ [core@worker-0 ~]$ udevadm info --path=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1 P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1 (...) E: NM_UNMANAGED=1 (...) ~~~ I could get this to work in a lab of mine with Intel cards, though. So maybe it's something specific to the Mellanox. ========================================= I'm not sure where we have to route this BZ. If it's RHCOS, or should be routed to the SR-IOV operator devs, or IPI installation. In theory, we "only" have to find a way to tell NetworkManager to keep its fingers away from managing any VFs.
Thanks Karim, we'll try that!
> I assume that /etc/NetworkManager/system-connections/default_connection.nmconnection is pushed by the IPI installer It's not created by the installer, it's a default connection created by NetworkManager IIUC As Karim mentioned it may be possible to override this behavior by injecting some additional config, either a MachineConfig to adjust the NM config can be applied post install as part of the SR-IOV configuration, or it can be provided at install time via the installer `create manifests` step. Reassigning to the CNF team to decide how to proceed.
CNF Platform Validation is the component for the cnf-tests suite, so it's probably not the right component to move this bz to. In any case, the SR-IOV operator already has the logic in place for creating the udev rule, which was https://github.com/openshift/sriov-network-operator/blob/5cab948617a7fefaa58e10280b1c6a0b3872ab89/pkg%2Fdaemon%2Fdaemon.go#L729 It only adds rules for supported devices though, as per https://github.com/openshift/sriov-network-operator/blob/b08f8433bfc7fbdb9a9175ee6ec8a95c12b791f8/api%2Fv1%2Fhelper.go#L35 @Andreas, mind checking the content of /host/etc/udev/rules.d/10-nm-unmanaged.rules on the host to see if the sriov-operator created it (and also, checking if the card is one among those above) ? Moving to SR-IOV operator in the meanwhile, if we find that the udev rules are not working, we'll move the bz again.
There's a udev rule pushed by the SR-IOV operator, as well as the one that I pushed, on the customer system. And the VF device is listed as NM_UNMANAGED: [core@worker-0 ~]$ sudo cat /etc/udev/rules.d/10-nm-unmanaged.rules ACTION=="add|change", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1" [core@worker-0 ~]$ sudo cat /etc/udev/rules.d/99-vfs-unmanaged.rules ENV{PCI_ID}=="8086:10ED", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by PCI ID ENV{ID_NET_DRIVER}=="ixgbevf", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by driver name ENV{PCI_ID}=="15B3:1018", ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID ENV{ID_VENDOR_ID}==0x15b3, ENV{ID_MODEL_ID}==0x1018, ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID [core@worker-0 ~]$ udevadm info --path=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1 | grep NM E: NM_UNMANAGED=1 [core@worker-0 ~]$ udevadm info --path=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1 P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1 E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1 E: ID_BUS=pci E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] E: ID_MODEL_ID=0x1018 E: ID_NET_DRIVER=mlx5_core E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link E: ID_NET_NAME=ens1f0v1 E: ID_NET_NAME_MAC=(redacted) E: ID_NET_NAME_PATH=enp59s0f0v1 E: ID_NET_NAME_SLOT=ens1f0v1 E: ID_PATH=pci-0000:3b:00.3 E: ID_PATH_TAG=pci-0000_3b_00_3 E: ID_PCI_CLASS_FROM_DATABASE=Network controller E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies E: ID_VENDOR_ID=0x15b3 E: IFINDEX=57 E: INTERFACE=ens1f0v1 E: NM_UNMANAGED=1 E: SUBSYSTEM=net E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v1 /sys/subsystem/net/devices/ens1f0v1 E: TAGS=:systemd: E: USEC_INITIALIZED=175104174 Yet, network manager manages the VFs
The following also does not help): ~~~ [core@worker-0 ~]$ sudo cat /etc/NetworkManager/conf.d/99-nodefault.conf [main] no-auto-default=* ~~~ (see the aforementioned private comment) ----------- #c8 actually clarified things for me a bit more. I'll go into a remote session with the customer to see what's up with NetworkManager and why it ignored the NM_UNMANAGED
## Issue When scaling up and down a deployment with SR-IOV interfaces attaches to its pods, the worker node at some point ends up running DHCP on the VFs when they move back to netns 1. ~~~ [root@worker-0 ~]# ip a ls dev ens1f0v21 64: ens1f0v21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether <redacted mac> brd ff:ff:ff:ff:ff:ff inet 192.168.123.174/24 brd 192.168.123.255 scope global dynamic noprefixroute ens1f0v21 valid_lft 690795sec preferred_lft 690795sec inet6 ... scope link noprefixroute valid_lft forever preferred_lft forever [root@worker-0 ~]# nmcli conn NAME UUID TYPE DEVICE Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f0v21 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f1 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f1 ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet eno2np1 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f0 br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0 ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0 ~~~ This is particularly problematic if the VFs belong to the PF of the machine network interface, as they will get an additional lease on the machine network and thus will break node networking. ## Test setup ### Base setup Use the setup from the following issue description: OpenShift 4.6. The following directory is used https://mirror.openshift.com/pub/openshift-v4/clients/ocp/candidate-4.6/ A Mellanox MT27800 is installed on the worker nodes. One of the Mellanox ports is used as the baremetal network during install. I am able to cause connectivity issues when scaling the a deployment up to 3 and down to 0 repeatedly about 3 or 4 times. I see a connectivity problem when I see the following node statuses [kni@provisioner ~]$ oc get nodes NAME STATUS ROLES AGE VERSION master-0.ocp3.example.com Ready master 7d v1.19.0+d59ce34 master-1.ocp3.example.com Ready master 7d v1.19.0+d59ce34 master-2.ocp3.example.com Ready master 7d v1.19.0+d59ce34 worker-0.ocp3.example.com NotReady worker 7d v1.19.0+d59ce34 worker-1.ocp3.example.com Ready worker 7d v1.19.0+d59ce34 I will need to reboot worker-0 to get to back to a Ready state. The deployment yaml I used is below apiVersion: apps/v1 kind: Deployment metadata: name: example namespace: user-dev spec: selector: matchLabels: app: ubuntu-example replicas: 1 template: metadata: labels: app: ubuntu-example annotations: k8s.v1.cni.cncf.io/networks: >- user-dev/user-w0-ens1f0-mlx5-netdev-vxlan spec: containers: - name: ubuntu-example image: ubuntu command: - sleep - infinity nodeSelector: kubernetes.io/hostname: worker-0.ocp3.example.com The SRIOV Network I used is below apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: annotations: operator.sriovnetwork.openshift.io/last-network-namespace: user-dev selfLink: >- /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworks/user-w0-ens1f0-mlx5-netdev-vxlan resourceVersion: '3294180' name: user-w0-ens1f0-mlx5-netdev-vxlan uid: f4af9200-015e-4ffb-8dae-c219c542fca7 creationTimestamp: '2020-10-21T08:10:28Z' generation: 7 managedFields: - apiVersion: sriovnetwork.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: 'f:metadata': 'f:annotations': .: {} 'f:operator.sriovnetwork.openshift.io/last-network-namespace': {} 'f:finalizers': .: {} 'v:"netattdef.finalizers.sriovnetwork.openshift.io"': {} 'f:status': {} manager: sriov-network-operator operation: Update time: '2020-10-21T08:10:28Z' - apiVersion: sriovnetwork.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: 'f:spec': .: {} 'f:capabilities': {} 'f:ipam': {} 'f:networkNamespace': {} 'f:resourceName': {} 'f:spoofChk': {} 'f:trust': {} manager: Mozilla operation: Update time: '2020-10-27T07:47:37Z' namespace: openshift-sriov-network-operator finalizers: - netattdef.finalizers.sriovnetwork.openshift.io spec: capabilities: '{"mac": true, "ips": true}' ipam: >- { "type": "host-local", "subnet": "192.168.123.0/24", "rangeStart": "192.168.123.159", "rangeEnd": "192.168.123.159" } networkNamespace: user-dev resourceName: SriovW0Ens1f0Mlx5NetdevPolicy spoofChk: 'off' trust: 'on' The SRIOV Policy for this SRIOV Network is apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: > {"apiVersion":"sriovnetwork.openshift.io/v1","kind":"SriovNetworkNodePolicy","metadata":{"annotations":{},"name":"sriov-w0-ens1f0-mlx5-netdev-policy","namespace":"openshift-sriov-network-operator"},"spec":{"deviceType":"netdevice","isRdma":true,"nicSelector":{"pfNames":["ens1f0"]},"nodeSelector":{"kubernetes.io/hostname":"worker-0.ocp3.example.com"},"numVfs":30,"priority":99,"resourceName":"SriovW0Ens1f0Mlx5NetdevPolicy"}} selfLink: >- /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodepolicies/sriov-w0-ens1f0-mlx5-netdev-policy resourceVersion: '481339' name: sriov-w0-ens1f0-mlx5-netdev-policy uid: fe1c3aaf-9abc-4aeb-a607-3a8cc22f471d creationTimestamp: '2020-10-21T06:08:50Z' generation: 1 managedFields: - apiVersion: sriovnetwork.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: 'f:metadata': 'f:annotations': .: {} 'f:kubectl.kubernetes.io/last-applied-configuration': {} 'f:spec': .: {} 'f:deviceType': {} 'f:isRdma': {} 'f:nicSelector': .: {} 'f:pfNames': {} 'f:nodeSelector': .: {} 'f:kubernetes.io/hostname': {} 'f:numVfs': {} 'f:priority': {} 'f:resourceName': {} manager: oc operation: Update time: '2020-10-21T06:08:50Z' namespace: openshift-sriov-network-operator spec: deviceType: netdevice isRdma: true linkType: eth nicSelector: pfNames: - ens1f0 nodeSelector: kubernetes.io/hostname: worker-0.ocp3.example.com numVfs: 30 priority: 99 resourceName: SriovW0Ens1f0Mlx5NetdevPolicy $ oc get net-attach-def -n user-dev -o yaml apiVersion: v1 items: - apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: annotations: k8s.v1.cni.cncf.io/resourceName: openshift.io/SriovW0Ens1f0Mlx5NetdevPolicy creationTimestamp: "2020-10-21T08:07:57Z" generation: 1 managedFields: - apiVersion: k8s.cni.cncf.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:k8s.v1.cni.cncf.io/resourceName: {} f:spec: .: {} f:config: {} manager: sriov-network-operator operation: Update time: "2020-10-21T08:07:57Z" name: user-w0-ens1f0-mlx5-netdev-3805 namespace: user-dev resourceVersion: "526756" selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/user-dev/network-attachment-definitions/user-w0-ens1f0-mlx5-netdev-3805 uid: 47a81653-4c1f-4636-b8ed-18b51b578e0a spec: config: '{ "cniVersion":"0.3.1", "name":"user-w0-ens1f0-mlx5-netdev-3805", "type":"sriov","vlan":3805,"spoofchk":"off","trust":"on","vlanQoS":0,"capabilities":{"mac": true, "ips": true},"ipam":{"type":"host-local","subnet":"6.6.6.0/24","rangeStart":"6.6.6.185","rangeEnd":"6.6.6.200"} }' - apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: annotations: k8s.v1.cni.cncf.io/resourceName: openshift.io/SriovW0Ens1f0Mlx5NetdevPolicy creationTimestamp: "2020-10-21T08:10:28Z" generation: 7 managedFields: - apiVersion: k8s.cni.cncf.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:k8s.v1.cni.cncf.io/resourceName: {} f:spec: .: {} f:config: {} manager: sriov-network-operator operation: Update time: "2020-10-27T07:47:37Z" name: user-w0-ens1f0-mlx5-netdev-vxlan namespace: user-dev resourceVersion: "3294181" selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/user-dev/network-attachment-definitions/user-w0-ens1f0-mlx5-netdev-vxlan uid: 35361b4a-3fea-4389-9307-f3af81ed42d2 spec: config: '{ "cniVersion":"0.3.1", "name":"user-w0-ens1f0-mlx5-netdev-vxlan", "type":"sriov","vlan":0,"spoofchk":"off","trust":"on","vlanQoS":0,"capabilities":{"mac": true, "ips": true},"ipam":{"type":"host-local","subnet":"192.168.123.0/24","rangeStart":"192.168.123.159","rangeEnd":"192.168.123.159"} }' - apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: annotations: k8s.v1.cni.cncf.io/resourceName: openshift.io/SriovW0Ens1f1Mlx5NetdevPolicy creationTimestamp: "2020-10-21T08:08:51Z" generation: 1 managedFields: - apiVersion: k8s.cni.cncf.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:k8s.v1.cni.cncf.io/resourceName: {} f:spec: .: {} f:config: {} manager: sriov-network-operator operation: Update time: "2020-10-21T08:08:51Z" name: user-w0-ens1f1-mlx5-netdev-3805 namespace: user-dev resourceVersion: "527023" selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/user-dev/network-attachment-definitions/user-w0-ens1f1-mlx5-netdev-3805 uid: e2fe5ad4-c610-45ef-a247-98e879e9bfa3 spec: config: '{ "cniVersion":"0.3.1", "name":"user-w0-ens1f1-mlx5-netdev-3805", "type":"sriov","vlan":3805,"spoofchk":"off","trust":"on","vlanQoS":0,"capabilities":{"mac": true, "ips": true},"ipam":{"type":"host-local","subnet":"6.6.6.0/24","rangeStart":"6.6.6.185","rangeEnd":"6.6.6.200"} }' kind: List metadata: resourceVersion: "" selfLink: "" ### Force IPAM shortage Create IPAM shortage in network, meaning all IPs on the network should be used. This is the easiest and most reliable way of recreating the issue, as the pods will go into a deletion/recreation loop which will cause repeated binding and unbinding of the same VF from/to the host network namespace from/to the pod namespace. During reproducer testing, the pod will show something like the following when being scaled up: ~~~ ovisioner ~]$ oc describe pod example-f79f9f5bd-m5g9g | tail -n 30 Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-ks6dg: Type: Secret (a volume populated by a Secret) SecretName: default-token-ks6dg Optional: false podnetinfo: Type: DownwardAPI (a volume populated by information about the pod) Items: metadata.labels -> labels metadata.annotations -> annotations QoS Class: BestEffort Node-Selectors: kubernetes.io/hostname=worker-0.ocp3.example.com Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 25s default-scheduler Successfully assigned user-dev/example-f79f9f5bd-m5g9g to worker-0.ocp3.example.com Normal AddedInterface 23s multus Add eth0 [10.128.2.61/23] Warning FailedCreatePodSandBox 23s kubelet Failed to create pod sandbox: rpc error: code = Unknown de sc = failed to create pod network sandbox k8s_example-f79f9f5bd-m5g9g_user-dev_061f6d34-ae31-46fd-9f5d-ef20981e6850_0 (9e7c6b1ba48b8cd47227a47ad3025c6b9f14f5d481698cd27ce898d5e6a67451): [user-dev/example-f79f9f5bd-m5g9g:user-w0-ens1f0 -mlx5-netdev-vxlan]: error adding container to network "user-w0-ens1f0-mlx5-netdev-vxlan": failed to set up IPAM plug in type "host-local" from the device "ens1f0": failed to allocate for range 0: no IP addresses available in range set: 192.168.123.150-192.168.123.151 Normal AddedInterface 21s multus Add eth0 [10.128.2.61/23] Warning FailedCreatePodSandBox 20s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_example-f79f9f5bd-m5g9g_user-dev_061f6d34-ae31-46fd-9f5d-ef20981e6850_0(303021dd4b332cee6008dd638d8333d43c4486a54a8c0c63e41fd2f5386a8c9b): [user-dev/example-f79f9f5bd-m5g9g:user-w0-ens1f0-mlx5-netdev-vxlan]: error adding container to network "user-w0-ens1f0-mlx5-netdev-vxlan": failed to set up IPAM plugin type "host-local" from the device "ens1f0": failed to allocate for range 0: no IP addresses available in range set: 192.168.123.150-192.168.123.151 Normal AddedInterface 8s multus Add eth0 [10.128.2.61/23] Warning FailedCreatePodSandBox 7s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_example-f79f9f5bd-m5g9g_user-dev_061f6d34-ae31-46fd-9f5d-ef20981e6850_0(e5261a21b34e5d68ee8a17959bb1c4fb41ce9111d442fc48dc393f22ac36fd01): [user-dev/example-f79f9f5bd-m5g9g:user-w0-ens1f0-mlx5-netdev-vxlan]: error adding container to network "user-w0-ens1f0-mlx5-netdev-vxlan": failed to set up IPAM plugin type "host-local" from the device "ens1f0": failed to allocate for range 0: no IP addresses available in range set: 192.168.123.150-192.168.123.151 [kni@provisioner ~]$ ~~~ ## Test iterations * Test 1: system defaults * Test 2: ~~~ cp /etc/udev/rules.d/10-nm-unmanaged.rules /etc/udev/rules.d/99-test.rules ~~~ * Test 3: ~~~ [root@worker-0 ~]# cat /etc/udev/rules.d/99-test.rules ACTION=="add|change|move", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1" ~~~ ## Test execution ### Clean starting state On each test iteration, make sure to: * reboot the node * wait for all VFs to be up post reboot * make sure the default NM connection exists and has IP method set to "auto" for IPv4/IPv6 * make sure the node is in READY state * make sure that the `example` deployment is scaled to `0` ($ # oc scale deployment example --replicas=0) * make sure that the following shows only valid connections and only one interface on the subnet: ~~~ Nmcli conn Ip a | grep 10.144 ~~~ ### Running the test Open one CLI there you run on worker-0.ocp3.example.com: ~~~ Nmcli conn Ip a | grep 10.144 udevadm monitor -p & nmcli conn mon & ~~~ Then scale the deployment from 0 to 1: ~~~ $ # oc scale deployment example --replicas=1 ~~~ ========================================================================================================================= Analysis of issue, from test2 results: When the Mellanox interface is added to the kernel, we see 3 different events, one add and 2 moves: ~~~ UDEV [417.691883] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net) ACTION=add DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_DRIVER=mlx5_core ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link ID_NET_NAME=ens1f0v23 ID_NET_NAME_MAC=enx<redacted mac> ID_NET_NAME_PATH=enp59s0f0v23 ID_NET_NAME_SLOT=ens1f0v23 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=ens1f0v23 INTERFACE_OLD=net1 NM_UNMANAGED=1 SEQNUM=34212 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23 TAGS=:systemd: USEC_INITIALIZED=417686148 UDEV [417.692216] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=net1 SEQNUM=34213 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 TAGS=:systemd: USEC_INITIALIZED=417692113 UDEV [417.693715] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_NAME_MAC=enx<redacted mac> ID_NET_NAME_PATH=enp59s0f0v23 ID_NET_NAME_SLOT=ens1f0v23 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=ens1f0v23 SEQNUM=34217 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23 TAGS=:systemd: USEC_INITIALIZED=417692113 Wired Connection: connection profile changed ~~~ We see from the above that NetworkManager now added a new connection profile to the default connection. This is what usually happens, though. In the results of test2, we can see that the exact same sequence happens 5 times before [0], without any negative consequences: ~~~ UDEV [354.543365] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net) ACTION=add DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_DRIVER=mlx5_core ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link ID_NET_NAME=ens1f0v23 ID_NET_NAME_MAC=enx<redacted mac> ID_NET_NAME_PATH=enp59s0f0v23 ID_NET_NAME_SLOT=ens1f0v23 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=ens1f0v23 INTERFACE_OLD=net1 NM_UNMANAGED=1 SEQNUM=33477 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23 TAGS=:systemd: USEC_INITIALIZED=354538388 UDEV [354.543844] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=net1 SEQNUM=33478 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 TAGS=:systemd: USEC_INITIALIZED=354543742 UDEV [354.545042] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_NAME_MAC=enx<redacted mac> ID_NET_NAME_PATH=enp59s0f0v23 ID_NET_NAME_SLOT=ens1f0v23 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=ens1f0v23 SEQNUM=33482 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23 TAGS=:systemd: USEC_INITIALIZED=354543742 ~~~ My assumption is that there's a race condition with udev and the "move" event (/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1) does not add NM_UNMANAGED=1, contrary to the add event of /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23. Under specific conditions (which I cannot explain), a race occurs, and device path /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 is then added without the environment variable NM_UNMANAGED=1 During our tests, once the issue reproduced, we could see the following info for the path - a retrigger would add the correct environment variable and value (the below is from an earlier test with a different virtual function): ~~~ [root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22 P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22 E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/net1 E: ID_BUS=pci E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] E: ID_MODEL_ID=0x1018 E: ID_NET_NAME_MAC=enx<redacted mac> E: ID_NET_NAME_PATH=enp59s0f0v22 E: ID_NET_NAME_SLOT=ens1f0v22 E: ID_PATH=pci-0000:3b:03.0 E: ID_PATH_TAG=pci-0000_3b_03_0 E: ID_PCI_CLASS_FROM_DATABASE=Network controller E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies E: ID_VENDOR_ID=0x15b3 E: IFINDEX=73 E: INTERFACE=ens1f0v22 E: SUBSYSTEM=net E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v22 E: TAGS=:systemd: E: USEC_INITIALIZED=419436208 [root@worker-0 ~]# ls /etc/udev/rules.d/ 10-nm-unmanaged.rules 70-persistent-ipoib.rules [root@worker-0 ~]# udevadm control --reload-rules && udevadm trigger [root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22 P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22 E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22 E: ID_BUS=pci E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] E: ID_MODEL_ID=0x1018 E: ID_NET_NAME_MAC=enx<redacted mac> E: ID_NET_NAME_PATH=enp59s0f0v22 E: ID_NET_NAME_SLOT=ens1f0v22 E: ID_PATH=pci-0000:3b:03.0 E: ID_PATH_TAG=pci-0000_3b_03_0 E: ID_PCI_CLASS_FROM_DATABASE=Network controller E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies E: ID_VENDOR_ID=0x15b3 E: IFINDEX=73 E: INTERFACE=ens1f0v22 E: NM_UNMANAGED=1 E: SUBSYSTEM=net E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v22 E: TAGS=:systemd: E: USEC_INITIALIZED=419436208 ~~~ As a matter of fact, the `net1` device path did not exist (output is from yet another test, after another fail): ~~~ [root@worker-0 ~]# ip a ls dev ens1f0v21 64: ens1f0v21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether <redacted mac> brd ff:ff:ff:ff:ff:ff inet 192.168.123.174/24 brd 192.168.123.255 scope global dynamic noprefixroute ens1f0v21 valid_lft 690895sec preferred_lft 690895sec inet6 ... scope link noprefixroute valid_lft forever preferred_lft forever [root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21 P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21 E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/net1 E: ID_BUS=pci E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] E: ID_MODEL_ID=0x1018 E: ID_NET_NAME_MAC=enx<redacted mac> E: ID_NET_NAME_PATH=enp59s0f0v21 E: ID_NET_NAME_SLOT=ens1f0v21 E: ID_PATH=pci-0000:3b:02.7 E: ID_PATH_TAG=pci-0000_3b_02_7 E: ID_PCI_CLASS_FROM_DATABASE=Network controller E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies E: ID_VENDOR_ID=0x15b3 E: IFINDEX=64 E: INTERFACE=ens1f0v21 E: SUBSYSTEM=net E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v21 E: TAGS=:systemd: E: USEC_INITIALIZED=1221371446 [root@worker-0 ~]# udevadm test -a add /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21 calling: test version 239 This program is for debugging only, it does not run any program specified by a RUN key. It may show incorrect results, because some values may be different, or not available at a simulation run. Load module index Parsed configuration file /usr/lib/systemd/network/99-default.link Created link configuration context. Reading rules file: /usr/lib/udev/rules.d/01-md-raid-creating.rules Reading rules file: /usr/lib/udev/rules.d/10-dm.rules Reading rules file: /etc/udev/rules.d/10-nm-unmanaged.rules Reading rules file: /usr/lib/udev/rules.d/11-dm-lvm.rules Reading rules file: /usr/lib/udev/rules.d/11-dm-mpath.rules Reading rules file: /usr/lib/udev/rules.d/11-dm-parts.rules Reading rules file: /usr/lib/udev/rules.d/13-dm-disk.rules Reading rules file: /usr/lib/udev/rules.d/40-elevator.rules Reading rules file: /usr/lib/udev/rules.d/40-redhat.rules Reading rules file: /usr/lib/udev/rules.d/40-usb-blacklist.rules Reading rules file: /usr/lib/udev/rules.d/50-udev-default.rules Reading rules file: /usr/lib/udev/rules.d/60-alias-kmsg.rules Reading rules file: /usr/lib/udev/rules.d/60-block.rules Reading rules file: /usr/lib/udev/rules.d/60-cdrom_id.rules Reading rules file: /usr/lib/udev/rules.d/60-drm.rules Reading rules file: /usr/lib/udev/rules.d/60-evdev.rules Reading rules file: /usr/lib/udev/rules.d/60-fido-id.rules Reading rules file: /usr/lib/udev/rules.d/60-input-id.rules Reading rules file: /usr/lib/udev/rules.d/60-persistent-alsa.rules Reading rules file: /usr/lib/udev/rules.d/60-persistent-input.rules Reading rules file: /usr/lib/udev/rules.d/60-persistent-storage-tape.rules Reading rules file: /usr/lib/udev/rules.d/60-persistent-storage.rules Reading rules file: /usr/lib/udev/rules.d/60-persistent-v4l.rules Reading rules file: /usr/lib/udev/rules.d/60-raw.rules Reading rules file: /usr/lib/udev/rules.d/60-rdma-ndd.rules Reading rules file: /usr/lib/udev/rules.d/60-rdma-persistent-naming.rules Reading rules file: /usr/lib/udev/rules.d/60-sensor.rules Reading rules file: /usr/lib/udev/rules.d/60-serial.rules Reading rules file: /usr/lib/udev/rules.d/60-srp_daemon.rules Reading rules file: /usr/lib/udev/rules.d/60-tpm-udev.rules Reading rules file: /usr/lib/udev/rules.d/61-scsi-sg3_id.rules Reading rules file: /usr/lib/udev/rules.d/62-multipath.rules Reading rules file: /usr/lib/udev/rules.d/63-fc-wwpn-id.rules Reading rules file: /usr/lib/udev/rules.d/63-md-raid-arrays.rules Reading rules file: /usr/lib/udev/rules.d/63-scsi-sg3_symlink.rules Reading rules file: /usr/lib/udev/rules.d/64-btrfs.rules Reading rules file: /usr/lib/udev/rules.d/64-md-raid-assembly.rules Reading rules file: /usr/lib/udev/rules.d/65-gce-disk-naming.rules Reading rules file: /usr/lib/udev/rules.d/65-md-incremental.rules Reading rules file: /usr/lib/udev/rules.d/66-azure-storage.rules Reading rules file: /usr/lib/udev/rules.d/66-kpartx.rules Reading rules file: /usr/lib/udev/rules.d/68-azure-sriov-nm-unmanaged.rules Reading rules file: /usr/lib/udev/rules.d/68-del-part-nodes.rules Reading rules file: /usr/lib/udev/rules.d/69-dm-lvm-metad.rules Reading rules file: /usr/lib/udev/rules.d/69-md-clustered-confirm-device.rules Reading rules file: /usr/lib/udev/rules.d/70-joystick.rules Reading rules file: /usr/lib/udev/rules.d/70-mouse.rules Reading rules file: /etc/udev/rules.d/70-persistent-ipoib.rules Reading rules file: /usr/lib/udev/rules.d/70-power-switch.rules Reading rules file: /usr/lib/udev/rules.d/70-touchpad.rules Reading rules file: /usr/lib/udev/rules.d/70-uaccess.rules Reading rules file: /usr/lib/udev/rules.d/71-seat.rules Reading rules file: /usr/lib/udev/rules.d/73-idrac.rules Reading rules file: /usr/lib/udev/rules.d/73-seat-late.rules Reading rules file: /usr/lib/udev/rules.d/75-net-description.rules Reading rules file: /usr/lib/udev/rules.d/75-probe_mtd.rules Reading rules file: /usr/lib/udev/rules.d/75-rdma-description.rules Reading rules file: /usr/lib/udev/rules.d/78-sound-card.rules Reading rules file: /usr/lib/udev/rules.d/80-drivers.rules Reading rules file: /usr/lib/udev/rules.d/80-net-setup-link.rules Reading rules file: /usr/lib/udev/rules.d/84-nm-drivers.rules Reading rules file: /usr/lib/udev/rules.d/85-nm-unmanaged.rules Reading rules file: /usr/lib/udev/rules.d/90-coreos-device-mapper.rules Reading rules file: /usr/lib/udev/rules.d/90-iwpmd.rules Reading rules file: /usr/lib/udev/rules.d/90-nm-thunderbolt.rules Reading rules file: /usr/lib/udev/rules.d/90-rdma-hw-modules.rules Reading rules file: /usr/lib/udev/rules.d/90-rdma-ulp-modules.rules Reading rules file: /usr/lib/udev/rules.d/90-rdma-umad.rules Reading rules file: /usr/lib/udev/rules.d/90-vconsole.rules Reading rules file: /usr/lib/udev/rules.d/91-drm-modeset.rules Reading rules file: /usr/lib/udev/rules.d/91-vfio.rules Reading rules file: /usr/lib/udev/rules.d/95-dm-notify.rules Reading rules file: /usr/lib/udev/rules.d/98-rdma.rules Reading rules file: /usr/lib/udev/rules.d/99-azure-product-uuid.rules Reading rules file: /usr/lib/udev/rules.d/99-systemd.rules Reading rules file: /usr/lib/udev/rules.d/99-vmware-scsi-udev.rules rules contain 49152 bytes tokens (4096 * 12 bytes), 23140 bytes strings 3345 strings (44695 bytes), 2318 de-duplicated (22583 bytes), 1028 trie nodes used IMPORT builtin 'net_id' /usr/lib/udev/rules.d/75-net-description.rules:6 IMPORT builtin 'hwdb' /usr/lib/udev/rules.d/75-net-description.rules:12 IMPORT builtin 'path_id' /usr/lib/udev/rules.d/80-net-setup-link.rules:5 IMPORT builtin 'net_setup_link' /usr/lib/udev/rules.d/80-net-setup-link.rules:9 Config file /usr/lib/systemd/network/99-default.link applies to device ens1f0v21 link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. NAME 'ens1f0v21' /usr/lib/udev/rules.d/80-net-setup-link.rules:11 RUN 'kmod load mlx5_ib' /usr/lib/udev/rules.d/90-rdma-hw-modules.rules:20 RUN '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/$name --prefix=/net/ipv4/neigh/$name --prefix=/net/ipv6/conf/$name --prefix=/net/ipv6/neigh/$name' /usr/lib/udev/rules.d/99-systemd.rules:60 ACTION=add DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_DRIVER=mlx5_core ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link ID_NET_NAME=ens1f0v21 ID_NET_NAME_MAC=enx<redacted mac> ID_NET_NAME_PATH=enp59s0f0v21 ID_NET_NAME_SLOT=ens1f0v21 ID_PATH=pci-0000:3b:02.7 ID_PATH_TAG=pci-0000_3b_02_7 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=64 INTERFACE=ens1f0v21 NM_UNMANAGED=1 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v21 TAGS=:systemd: USEC_INITIALIZED=1221371446 run: 'kmod load mlx5_ib' run: '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/ens1f0v21 --prefix=/net/ipv4/neigh/ens1f0v21 --prefix=/net/ipv6/conf/ens1f0v21 --prefix=/net/ipv6/neigh/ens1f0v21' Unload module index Unloaded link configuration context. [root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21 P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21 E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21 E: ID_BUS=pci E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] E: ID_MODEL_ID=0x1018 E: ID_NET_DRIVER=mlx5_core E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link E: ID_NET_NAME=ens1f0v21 E: ID_NET_NAME_MAC=enx<redacted mac> E: ID_NET_NAME_PATH=enp59s0f0v21 E: ID_NET_NAME_SLOT=ens1f0v21 E: ID_PATH=pci-0000:3b:02.7 E: ID_PATH_TAG=pci-0000_3b_02_7 E: ID_PCI_CLASS_FROM_DATABASE=Network controller E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies E: ID_VENDOR_ID=0x15b3 E: IFINDEX=64 E: INTERFACE=ens1f0v21 E: NM_UNMANAGED=1 E: SUBSYSTEM=net E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v21 E: TAGS=:systemd: E: USEC_INITIALIZED=1221371446 [root@worker-0 ~]# ip a ls dev ens1f0v21 64: ens1f0v21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether <redacted mac> brd ff:ff:ff:ff:ff:ff inet 192.168.123.174/24 brd 192.168.123.255 scope global dynamic noprefixroute ens1f0v21 valid_lft 690795sec preferred_lft 690795sec inet6 ... scope link noprefixroute valid_lft forever preferred_lft forever [root@worker-0 ~]# nmcli conn NAME UUID TYPE DEVICE Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f0v21 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f1 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f1 ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet eno2np1 Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f0 br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0 ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0 [root@worker-0 ~]# [root@worker-0 ~]# [root@worker-0 ~]# [root@worker-0 ~]# udevadm info --path /devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/net1 syspath not found ~~~ ======================= [0] [akaris@linux 02786983]$ cat udevseq1.txt UDEV [417.691883] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net) ACTION=add DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_DRIVER=mlx5_core ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link ID_NET_NAME=ens1f0v23 ID_NET_NAME_MAC=enx<redacted mac> ID_NET_NAME_PATH=enp59s0f0v23 ID_NET_NAME_SLOT=ens1f0v23 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=ens1f0v23 INTERFACE_OLD=net1 NM_UNMANAGED=1 SEQNUM=34212 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23 TAGS=:systemd: USEC_INITIALIZED=417686148 UDEV [417.692216] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=net1 SEQNUM=34213 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 TAGS=:systemd: USEC_INITIALIZED=417692113 UDEV [417.693715] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_NAME_MAC=enx<redacted mac> ID_NET_NAME_PATH=enp59s0f0v23 ID_NET_NAME_SLOT=ens1f0v23 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=ens1f0v23 SEQNUM=34217 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23 TAGS=:systemd: USEC_INITIALIZED=417692113 Wired Connection: connection profile changed [akaris@linux 02786983]$ cat udevseq2.txt UDEV [354.543365] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net) ACTION=add DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_DRIVER=mlx5_core ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link ID_NET_NAME=ens1f0v23 ID_NET_NAME_MAC=enx<redacted mac> ID_NET_NAME_PATH=enp59s0f0v23 ID_NET_NAME_SLOT=ens1f0v23 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=ens1f0v23 INTERFACE_OLD=net1 NM_UNMANAGED=1 SEQNUM=33477 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23 TAGS=:systemd: USEC_INITIALIZED=354538388 UDEV [354.543844] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=net1 SEQNUM=33478 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 TAGS=:systemd: USEC_INITIALIZED=354543742 UDEV [354.545042] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_NAME_MAC=enx<redacted mac> ID_NET_NAME_PATH=enp59s0f0v23 ID_NET_NAME_SLOT=ens1f0v23 ID_PATH=pci-0000:3b:03.1 ID_PATH_TAG=pci-0000_3b_03_1 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=71 INTERFACE=ens1f0v23 SEQNUM=33482 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23 TAGS=:systemd: USEC_INITIALIZED=354543742 [akaris@linux 02786983]$ diff udevseq1.txt udevseq2.txt 1c1 < UDEV [417.691883] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net) --- > UDEV [354.543365] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net) 23c23 < SEQNUM=34212 --- > SEQNUM=33477 27c27 < USEC_INITIALIZED=417686148 --- > USEC_INITIALIZED=354538388 29c29 < UDEV [417.692216] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) --- > UDEV [354.543844] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) 44c44 < SEQNUM=34213 --- > SEQNUM=33478 48c48 < USEC_INITIALIZED=417692113 --- > USEC_INITIALIZED=354543742 50c50 < UDEV [417.693715] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) --- > UDEV [354.545042] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net) 68c68 < SEQNUM=34217 --- > SEQNUM=33482 72,74c72 < USEC_INITIALIZED=417692113 < < Wired Connection: connection profile changed --- > USEC_INITIALIZED=354543742 ==================== How to work around the event with the SR-IOV network operator and a udev rule tweak: * match on `ACTION=="add|change|move"`: ~~~ [root@worker-0 ~]# cat /etc/udev/rules.d/99-test.rules ACTION=="add|change|move", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1" ~~~ This will add NM_UNMANAGED=1 to the udev "move" events, too: ~~~ UDEV [14134.602264] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/ens1f1v13 (net) ACTION=add DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/ens1f1v13 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_DRIVER=mlx5_core ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link ID_NET_NAME=ens1f1v13 ID_NET_NAME_MAC=enx36bb05ba0438 ID_NET_NAME_PATH=enp59s0f1v13 ID_NET_NAME_SLOT=ens1f1v13 ID_PATH=pci-0000:3b:05.5 ID_PATH_TAG=pci-0000_3b_05_5 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=114 INTERFACE=ens1f1v13 INTERFACE_OLD=net1 NM_UNMANAGED=1 SEQNUM=190826 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f1v13 TAGS=:systemd: USEC_INITIALIZED=14134596165 UDEV [14134.602702] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_PATH=pci-0000:3b:05.5 ID_PATH_TAG=pci-0000_3b_05_5 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=114 INTERFACE=net1 NM_UNMANAGED=1 SEQNUM=190827 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 TAGS=:systemd: USEC_INITIALIZED=14134602583 UDEV [14134.604382] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 (net) ACTION=move DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 ID_BUS=pci ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function] ID_MODEL_ID=0x1018 ID_NET_NAME_MAC=enx36bb05ba0438 ID_NET_NAME_PATH=enp59s0f1v13 ID_NET_NAME_SLOT=ens1f1v13 ID_PATH=pci-0000:3b:05.5 ID_PATH_TAG=pci-0000_3b_05_5 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_FROM_DATABASE=Mellanox Technologies ID_VENDOR_ID=0x15b3 IFINDEX=114 INTERFACE=ens1f1v13 NM_UNMANAGED=1 SEQNUM=190831 SUBSYSTEM=net SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f1v13 TAGS=:systemd: USEC_INITIALIZED=14134602583 ~~~ And in customer tests, the issue could not be reproduced.
Do we have a target 4.6 Z stream for this?
Verified this on 4.7.0-202012120244.p0 sudo cat /etc/udev/rules.d/10-nm-unmanaged.rules ACTION=="add|change|move", ATTRS{device}=="0x1014|0x1016|0x1018|0x101c|0x154c", ENV{NM_UNMANAGED}="1"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633