Bug 1896866
| Summary: | File /etc/NetworkManager/system-connections/default_connection.nmconnection is incompatible with SR-IOV operator | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Andreas Karis <akaris> |
| Component: | Networking | Assignee: | Federico Paolinelli <fpaoline> |
| Networking sub component: | SR-IOV | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | aos-bugs, bbennett, dansmall, dosmith, fpaoline, kboumedh, kholtz, pibanezr, zshi |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:32:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1906723 | ||
|
Description
Andreas Karis
2020-11-11 17:47:11 UTC
LATEST COMMENT FROM CUSTOMER - JOHN WONG:
Alright. .so I've ran a subset of your commands that I think applied to this comment..
Before reboot or deleting that /etc/NetworkManager/system-connections/default_connection.nmconnection file. I think there weren't as many from what you got from you analysis because ~22 hours has passed.
[core@worker-0 ~]$ sudo nmcli con show --active
NAME UUID TYPE DEVICE
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f0v12 <--- I think this is the problem VF.
ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet eno2np1
br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex
ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0
ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex
ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0
[core@worker-0 ~]$ ip -d address | less | grep -i 0v
53: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
54: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
55: ens1f0v10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
56: ens1f0v11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
57: ens1f0v12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
inet 10.144.175.175/24 brd 10.144.175.255 scope global dynamic noprefixroute ens1f0v12
58: ens1f0v13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
59: ens1f0v14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
60: ens1f0v15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
61: ens1f0v16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
62: ens1f0v17: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
63: ens1f0v18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
64: ens1f0v19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
65: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
66: ens1f0v20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
67: ens1f0v21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
68: ens1f0v22: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
69: ens1f0v23: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
70: ens1f0v24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
71: ens1f0v25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
72: ens1f0v26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
73: ens1f0v27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
74: ens1f0v28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
75: ens1f0v29: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
76: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
77: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
78: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
79: ens1f0v6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
80: ens1f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
81: ens1f0v8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
82: ens1f0v9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
I deleted /etc/NetworkManager/system-connections/default_connection.nmconnection then reboot. After waiting for about 15 minutes, it eventually got back in the same state.
[core@worker-0 ~]$ sudo nmcli con show --active
NAME UUID TYPE DEVICE
Wired connection 3 e3c41ffa-b95d-33a9-8c01-581070bbbac1 ethernet ens1f1
Wired connection 4 64eb35e9-4974-3925-baee-d8984a8d6d3a ethernet ens7f0
Wired connection 5 548649c0-0828-3bed-9a78-a9d7a6591ade ethernet ens7f1
Wired connection 6 b4a780af-c1ff-3e6d-9a98-8a0f688473c8 ethernet ens1f0v9 <----- Same issue but new VF and different UUID
ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex
Wired connection 2 b04ab6fd-e38c-3d96-8dd2-62d56aa82e37 ethernet eno2np1
br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex
ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0
ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex
ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0
[core@worker-0 ~]$ ip -d address | grep -i 0v
53: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
54: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
55: ens1f0v10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
56: ens1f0v11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
57: ens1f0v12: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
58: ens1f0v13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
59: ens1f0v14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
60: ens1f0v15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
61: ens1f0v16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
62: ens1f0v17: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
63: ens1f0v18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
64: ens1f0v19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
65: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
66: ens1f0v20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
67: ens1f0v21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
68: ens1f0v22: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
69: ens1f0v23: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
70: ens1f0v24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
71: ens1f0v25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
72: ens1f0v26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
73: ens1f0v27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
74: ens1f0v28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
75: ens1f0v29: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
76: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
77: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
78: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
79: ens1f0v6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
80: ens1f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
81: ens1f0v8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
82: ens1f0v9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
inet 10.144.175.176/24 brd 10.144.175.255 scope global dynamic noprefixroute ens1f0v9
Restored /etc/NetworkManager/system-connections/default_connection.nmconnection and made the following changed in your previous comment.
(...)
[ipv4]
dns-search=
method=disabled
[ipv6]
addr-gen-mode=eui64
dns-search=
method=disabled
(...)
Then rebooted
[core@worker-0 ~]$ sudo nmcli con show --active
NAME UUID TYPE DEVICE
ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex
br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex
ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0
ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex
ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet eno2np1
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f1
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f0
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f0v6 <--- Still seeing this but I don't see an IP on the VF (maybe I got lucky on this lottery?)
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f1
[core@worker-0 ~]$ ip -d address | grep -i 0v
54: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
55: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
56: ens1f0v10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
57: ens1f0v11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
58: ens1f0v12: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
59: ens1f0v13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
60: ens1f0v14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
61: ens1f0v15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
62: ens1f0v16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
63: ens1f0v17: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
64: ens1f0v18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
65: ens1f0v19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
66: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
67: ens1f0v20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
68: ens1f0v21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
69: ens1f0v22: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
70: ens1f0v23: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
71: ens1f0v24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
72: ens1f0v25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
73: ens1f0v26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
74: ens1f0v27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
75: ens1f0v28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
76: ens1f0v29: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
77: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
78: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
79: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
80: ens1f0v6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
81: ens1f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
82: ens1f0v8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
83: ens1f0v9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
Something I did noticed is that I wasn't able to ssh into worker-0 on another interface (this interface was used during install but I've kept it there for situations like this where I lost connectivity on the baremetal network
[kni@provisioner ~]$ ssh core.0.29
ssh: connect to host 172.22.0.29 port 22: No route to host
One approach that could be leveraged would be to create though a machine config this network manager conf (say in /etc/NetworkManager/conf.d/99-nodefault.conf ): [main] no-auto-default=* Which as per Network manager documentation does the following: Specify devices for which NetworkManager shouldn't create default wired connection (Auto eth0). By default, NetworkManager creates a temporary wired connection for any Ethernet device that is managed and doesn't have a connection configured. List a device in this option to inhibit creating the default connection for the device. May have the special value * to apply to all devices. When the default wired connection is deleted or saved to a new persistent connection by a plugin, the device is added to a list in the file /var/lib/NetworkManager/no-auto-default.state to prevent creating the default connection for that device again. We tried 3 options to work around this for the time being:
i) Deleting /etc/NetworkManager/system-connections/default_connection.nmconnection
This option does not work. The VFs will get an IP address lease from the DHCP server after they switch from the pods' to the host's namespace
ii) Set method=disabled in all sections in /etc/NetworkManager/system-connections/default_connection.nmconnection
This option **does** work as a workaround. When VFs switch from the pods' to the host's namespace, they do not obtain a DHCP lease.
iii) udev workaround with with method=auto in /etc/NetworkManager/system-connections/default_connection.nmconnection
~~~
cat <<'EOF' > /etc/udev/rules.d/99-vfs-unmanaged.rules
ENV{PCI_ID}=="8086:10ED", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by PCI ID
ENV{ID_NET_DRIVER}=="ixgbevf", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by driver name
ENV{PCI_ID}=="15B3:1018", ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID
ENV{ID_VENDOR_ID}==0x15b3, ENV{ID_MODEL_ID}==0x1018, ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID
EOF
udevadm control --reload-rules && udevadm trigger
~~~
Does not work at the moment. Even though we see the correct NM_UNMANAGED annotation:
~~~
[core@worker-0 ~]$ udevadm info --path=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
(...)
E: NM_UNMANAGED=1
(...)
~~~
I could get this to work in a lab of mine with Intel cards, though. So maybe it's something specific to the Mellanox.
=========================================
I'm not sure where we have to route this BZ. If it's RHCOS, or should be routed to the SR-IOV operator devs, or IPI installation. In theory, we "only" have to find a way to tell NetworkManager to keep its fingers away from managing any VFs.
Thanks Karim, we'll try that! > I assume that /etc/NetworkManager/system-connections/default_connection.nmconnection is pushed by the IPI installer
It's not created by the installer, it's a default connection created by NetworkManager IIUC
As Karim mentioned it may be possible to override this behavior by injecting some additional config, either a MachineConfig to adjust the NM config can be applied post install as part of the SR-IOV configuration, or it can be provided at install time via the installer `create manifests` step.
Reassigning to the CNF team to decide how to proceed.
CNF Platform Validation is the component for the cnf-tests suite, so it's probably not the right component to move this bz to. In any case, the SR-IOV operator already has the logic in place for creating the udev rule, which was https://github.com/openshift/sriov-network-operator/blob/5cab948617a7fefaa58e10280b1c6a0b3872ab89/pkg%2Fdaemon%2Fdaemon.go#L729 It only adds rules for supported devices though, as per https://github.com/openshift/sriov-network-operator/blob/b08f8433bfc7fbdb9a9175ee6ec8a95c12b791f8/api%2Fv1%2Fhelper.go#L35 @Andreas, mind checking the content of /host/etc/udev/rules.d/10-nm-unmanaged.rules on the host to see if the sriov-operator created it (and also, checking if the card is one among those above) ? Moving to SR-IOV operator in the meanwhile, if we find that the udev rules are not working, we'll move the bz again. There's a udev rule pushed by the SR-IOV operator, as well as the one that I pushed, on the customer system. And the VF device is listed as NM_UNMANAGED:
[core@worker-0 ~]$ sudo cat /etc/udev/rules.d/10-nm-unmanaged.rules
ACTION=="add|change", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1"
[core@worker-0 ~]$ sudo cat /etc/udev/rules.d/99-vfs-unmanaged.rules
ENV{PCI_ID}=="8086:10ED", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by PCI ID
ENV{ID_NET_DRIVER}=="ixgbevf", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by driver name
ENV{PCI_ID}=="15B3:1018", ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID
ENV{ID_VENDOR_ID}==0x15b3, ENV{ID_MODEL_ID}==0x1018, ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID
[core@worker-0 ~]$ udevadm info --path=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1 | grep NM
E: NM_UNMANAGED=1
[core@worker-0 ~]$ udevadm info --path=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_DRIVER=mlx5_core
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: ID_NET_NAME=ens1f0v1
E: ID_NET_NAME_MAC=(redacted)
E: ID_NET_NAME_PATH=enp59s0f0v1
E: ID_NET_NAME_SLOT=ens1f0v1
E: ID_PATH=pci-0000:3b:00.3
E: ID_PATH_TAG=pci-0000_3b_00_3
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=57
E: INTERFACE=ens1f0v1
E: NM_UNMANAGED=1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v1 /sys/subsystem/net/devices/ens1f0v1
E: TAGS=:systemd:
E: USEC_INITIALIZED=175104174
Yet, network manager manages the VFs
The following also does not help): ~~~ [core@worker-0 ~]$ sudo cat /etc/NetworkManager/conf.d/99-nodefault.conf [main] no-auto-default=* ~~~ (see the aforementioned private comment) ----------- #c8 actually clarified things for me a bit more. I'll go into a remote session with the customer to see what's up with NetworkManager and why it ignored the NM_UNMANAGED ## Issue
When scaling up and down a deployment with SR-IOV interfaces attaches to its pods, the worker node at some point ends up running DHCP on the VFs when they move back to netns 1.
~~~
[root@worker-0 ~]# ip a ls dev ens1f0v21
64: ens1f0v21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether <redacted mac> brd ff:ff:ff:ff:ff:ff
inet 192.168.123.174/24 brd 192.168.123.255 scope global dynamic noprefixroute ens1f0v21
valid_lft 690795sec preferred_lft 690795sec
inet6 ... scope link noprefixroute
valid_lft forever preferred_lft forever
[root@worker-0 ~]# nmcli conn
NAME UUID TYPE DEVICE
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f0v21
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f1
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f1
ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet eno2np1
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f0
br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex
ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0
ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex
ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0
~~~
This is particularly problematic if the VFs belong to the PF of the machine network interface, as they will get an additional lease on the machine network and thus will break node networking.
## Test setup
### Base setup
Use the setup from the following issue description:
OpenShift 4.6. The following directory is used
https://mirror.openshift.com/pub/openshift-v4/clients/ocp/candidate-4.6/
A Mellanox MT27800 is installed on the worker nodes. One of the Mellanox ports is used as the baremetal network during install.
I am able to cause connectivity issues when scaling the a deployment up to 3 and down to 0 repeatedly about 3 or 4 times.
I see a connectivity problem when I see the following node statuses
[kni@provisioner ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-0.ocp3.example.com Ready master 7d v1.19.0+d59ce34
master-1.ocp3.example.com Ready master 7d v1.19.0+d59ce34
master-2.ocp3.example.com Ready master 7d v1.19.0+d59ce34
worker-0.ocp3.example.com NotReady worker 7d v1.19.0+d59ce34
worker-1.ocp3.example.com Ready worker 7d v1.19.0+d59ce34
I will need to reboot worker-0 to get to back to a Ready state.
The deployment yaml I used is below
apiVersion: apps/v1
kind: Deployment
metadata:
name: example
namespace: user-dev
spec:
selector:
matchLabels:
app: ubuntu-example
replicas: 1
template:
metadata:
labels:
app: ubuntu-example
annotations:
k8s.v1.cni.cncf.io/networks: >-
user-dev/user-w0-ens1f0-mlx5-netdev-vxlan
spec:
containers:
- name: ubuntu-example
image: ubuntu
command:
- sleep
- infinity
nodeSelector:
kubernetes.io/hostname: worker-0.ocp3.example.com
The SRIOV Network I used is below
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
annotations:
operator.sriovnetwork.openshift.io/last-network-namespace: user-dev
selfLink: >-
/apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworks/user-w0-ens1f0-mlx5-netdev-vxlan
resourceVersion: '3294180'
name: user-w0-ens1f0-mlx5-netdev-vxlan
uid: f4af9200-015e-4ffb-8dae-c219c542fca7
creationTimestamp: '2020-10-21T08:10:28Z'
generation: 7
managedFields:
- apiVersion: sriovnetwork.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:operator.sriovnetwork.openshift.io/last-network-namespace': {}
'f:finalizers':
.: {}
'v:"netattdef.finalizers.sriovnetwork.openshift.io"': {}
'f:status': {}
manager: sriov-network-operator
operation: Update
time: '2020-10-21T08:10:28Z'
- apiVersion: sriovnetwork.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:spec':
.: {}
'f:capabilities': {}
'f:ipam': {}
'f:networkNamespace': {}
'f:resourceName': {}
'f:spoofChk': {}
'f:trust': {}
manager: Mozilla
operation: Update
time: '2020-10-27T07:47:37Z'
namespace: openshift-sriov-network-operator
finalizers:
- netattdef.finalizers.sriovnetwork.openshift.io
spec:
capabilities: '{"mac": true, "ips": true}'
ipam: >-
{ "type": "host-local", "subnet": "192.168.123.0/24", "rangeStart":
"192.168.123.159", "rangeEnd": "192.168.123.159" }
networkNamespace: user-dev
resourceName: SriovW0Ens1f0Mlx5NetdevPolicy
spoofChk: 'off'
trust: 'on'
The SRIOV Policy for this SRIOV Network is
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"sriovnetwork.openshift.io/v1","kind":"SriovNetworkNodePolicy","metadata":{"annotations":{},"name":"sriov-w0-ens1f0-mlx5-netdev-policy","namespace":"openshift-sriov-network-operator"},"spec":{"deviceType":"netdevice","isRdma":true,"nicSelector":{"pfNames":["ens1f0"]},"nodeSelector":{"kubernetes.io/hostname":"worker-0.ocp3.example.com"},"numVfs":30,"priority":99,"resourceName":"SriovW0Ens1f0Mlx5NetdevPolicy"}}
selfLink: >-
/apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodepolicies/sriov-w0-ens1f0-mlx5-netdev-policy
resourceVersion: '481339'
name: sriov-w0-ens1f0-mlx5-netdev-policy
uid: fe1c3aaf-9abc-4aeb-a607-3a8cc22f471d
creationTimestamp: '2020-10-21T06:08:50Z'
generation: 1
managedFields:
- apiVersion: sriovnetwork.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:kubectl.kubernetes.io/last-applied-configuration': {}
'f:spec':
.: {}
'f:deviceType': {}
'f:isRdma': {}
'f:nicSelector':
.: {}
'f:pfNames': {}
'f:nodeSelector':
.: {}
'f:kubernetes.io/hostname': {}
'f:numVfs': {}
'f:priority': {}
'f:resourceName': {}
manager: oc
operation: Update
time: '2020-10-21T06:08:50Z'
namespace: openshift-sriov-network-operator
spec:
deviceType: netdevice
isRdma: true
linkType: eth
nicSelector:
pfNames:
- ens1f0
nodeSelector:
kubernetes.io/hostname: worker-0.ocp3.example.com
numVfs: 30
priority: 99
resourceName: SriovW0Ens1f0Mlx5NetdevPolicy
$ oc get net-attach-def -n user-dev -o yaml
apiVersion: v1
items:
- apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
annotations:
k8s.v1.cni.cncf.io/resourceName: openshift.io/SriovW0Ens1f0Mlx5NetdevPolicy
creationTimestamp: "2020-10-21T08:07:57Z"
generation: 1
managedFields:
- apiVersion: k8s.cni.cncf.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:k8s.v1.cni.cncf.io/resourceName: {}
f:spec:
.: {}
f:config: {}
manager: sriov-network-operator
operation: Update
time: "2020-10-21T08:07:57Z"
name: user-w0-ens1f0-mlx5-netdev-3805
namespace: user-dev
resourceVersion: "526756"
selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/user-dev/network-attachment-definitions/user-w0-ens1f0-mlx5-netdev-3805
uid: 47a81653-4c1f-4636-b8ed-18b51b578e0a
spec:
config: '{ "cniVersion":"0.3.1", "name":"user-w0-ens1f0-mlx5-netdev-3805", "type":"sriov","vlan":3805,"spoofchk":"off","trust":"on","vlanQoS":0,"capabilities":{"mac":
true, "ips": true},"ipam":{"type":"host-local","subnet":"6.6.6.0/24","rangeStart":"6.6.6.185","rangeEnd":"6.6.6.200"}
}'
- apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
annotations:
k8s.v1.cni.cncf.io/resourceName: openshift.io/SriovW0Ens1f0Mlx5NetdevPolicy
creationTimestamp: "2020-10-21T08:10:28Z"
generation: 7
managedFields:
- apiVersion: k8s.cni.cncf.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:k8s.v1.cni.cncf.io/resourceName: {}
f:spec:
.: {}
f:config: {}
manager: sriov-network-operator
operation: Update
time: "2020-10-27T07:47:37Z"
name: user-w0-ens1f0-mlx5-netdev-vxlan
namespace: user-dev
resourceVersion: "3294181"
selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/user-dev/network-attachment-definitions/user-w0-ens1f0-mlx5-netdev-vxlan
uid: 35361b4a-3fea-4389-9307-f3af81ed42d2
spec:
config: '{ "cniVersion":"0.3.1", "name":"user-w0-ens1f0-mlx5-netdev-vxlan", "type":"sriov","vlan":0,"spoofchk":"off","trust":"on","vlanQoS":0,"capabilities":{"mac":
true, "ips": true},"ipam":{"type":"host-local","subnet":"192.168.123.0/24","rangeStart":"192.168.123.159","rangeEnd":"192.168.123.159"}
}'
- apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
annotations:
k8s.v1.cni.cncf.io/resourceName: openshift.io/SriovW0Ens1f1Mlx5NetdevPolicy
creationTimestamp: "2020-10-21T08:08:51Z"
generation: 1
managedFields:
- apiVersion: k8s.cni.cncf.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:k8s.v1.cni.cncf.io/resourceName: {}
f:spec:
.: {}
f:config: {}
manager: sriov-network-operator
operation: Update
time: "2020-10-21T08:08:51Z"
name: user-w0-ens1f1-mlx5-netdev-3805
namespace: user-dev
resourceVersion: "527023"
selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/user-dev/network-attachment-definitions/user-w0-ens1f1-mlx5-netdev-3805
uid: e2fe5ad4-c610-45ef-a247-98e879e9bfa3
spec:
config: '{ "cniVersion":"0.3.1", "name":"user-w0-ens1f1-mlx5-netdev-3805", "type":"sriov","vlan":3805,"spoofchk":"off","trust":"on","vlanQoS":0,"capabilities":{"mac":
true, "ips": true},"ipam":{"type":"host-local","subnet":"6.6.6.0/24","rangeStart":"6.6.6.185","rangeEnd":"6.6.6.200"}
}'
kind: List
metadata:
resourceVersion: ""
selfLink: ""
### Force IPAM shortage
Create IPAM shortage in network, meaning all IPs on the network should be used. This is the easiest and most reliable way of recreating the issue, as the pods will go into a deletion/recreation loop which will cause repeated binding and unbinding of the same VF from/to the host network namespace from/to the pod namespace.
During reproducer testing, the pod will show something like the following when being scaled up:
~~~
ovisioner ~]$ oc describe pod example-f79f9f5bd-m5g9g | tail -n 30
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-ks6dg:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-ks6dg
Optional: false
podnetinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
QoS Class: BestEffort
Node-Selectors: kubernetes.io/hostname=worker-0.ocp3.example.com
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25s default-scheduler Successfully assigned user-dev/example-f79f9f5bd-m5g9g to
worker-0.ocp3.example.com
Normal AddedInterface 23s multus Add eth0 [10.128.2.61/23]
Warning FailedCreatePodSandBox 23s kubelet Failed to create pod sandbox: rpc error: code = Unknown de
sc = failed to create pod network sandbox k8s_example-f79f9f5bd-m5g9g_user-dev_061f6d34-ae31-46fd-9f5d-ef20981e6850_0
(9e7c6b1ba48b8cd47227a47ad3025c6b9f14f5d481698cd27ce898d5e6a67451): [user-dev/example-f79f9f5bd-m5g9g:user-w0-ens1f0
-mlx5-netdev-vxlan]: error adding container to network "user-w0-ens1f0-mlx5-netdev-vxlan": failed to set up IPAM plug
in type "host-local" from the device "ens1f0": failed to allocate for range 0: no IP addresses available in range set: 192.168.123.150-192.168.123.151
Normal AddedInterface 21s multus Add eth0 [10.128.2.61/23]
Warning FailedCreatePodSandBox 20s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_example-f79f9f5bd-m5g9g_user-dev_061f6d34-ae31-46fd-9f5d-ef20981e6850_0(303021dd4b332cee6008dd638d8333d43c4486a54a8c0c63e41fd2f5386a8c9b): [user-dev/example-f79f9f5bd-m5g9g:user-w0-ens1f0-mlx5-netdev-vxlan]: error adding container to network "user-w0-ens1f0-mlx5-netdev-vxlan": failed to set up IPAM plugin type "host-local" from the device "ens1f0": failed to allocate for range 0: no IP addresses available in range set: 192.168.123.150-192.168.123.151
Normal AddedInterface 8s multus Add eth0 [10.128.2.61/23]
Warning FailedCreatePodSandBox 7s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_example-f79f9f5bd-m5g9g_user-dev_061f6d34-ae31-46fd-9f5d-ef20981e6850_0(e5261a21b34e5d68ee8a17959bb1c4fb41ce9111d442fc48dc393f22ac36fd01): [user-dev/example-f79f9f5bd-m5g9g:user-w0-ens1f0-mlx5-netdev-vxlan]: error adding container to network "user-w0-ens1f0-mlx5-netdev-vxlan": failed to set up IPAM plugin type "host-local" from the device "ens1f0": failed to allocate for range 0: no IP addresses available in range set: 192.168.123.150-192.168.123.151
[kni@provisioner ~]$
~~~
## Test iterations
* Test 1:
system defaults
* Test 2:
~~~
cp /etc/udev/rules.d/10-nm-unmanaged.rules /etc/udev/rules.d/99-test.rules
~~~
* Test 3:
~~~
[root@worker-0 ~]# cat /etc/udev/rules.d/99-test.rules
ACTION=="add|change|move", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1"
~~~
## Test execution
### Clean starting state
On each test iteration, make sure to:
* reboot the node
* wait for all VFs to be up post reboot
* make sure the default NM connection exists and has IP method set to "auto" for IPv4/IPv6
* make sure the node is in READY state
* make sure that the `example` deployment is scaled to `0` ($ # oc scale deployment example --replicas=0)
* make sure that the following shows only valid connections and only one interface on the subnet:
~~~
Nmcli conn
Ip a | grep 10.144
~~~
### Running the test
Open one CLI there you run on worker-0.ocp3.example.com:
~~~
Nmcli conn
Ip a | grep 10.144
udevadm monitor -p & nmcli conn mon &
~~~
Then scale the deployment from 0 to 1:
~~~
$ # oc scale deployment example --replicas=1
~~~
=========================================================================================================================
Analysis of issue, from test2 results:
When the Mellanox interface is added to the kernel, we see 3 different events, one add and 2 moves:
~~~
UDEV [417.691883] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v23
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=34212
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=417686148
UDEV [417.692216] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=net1
SEQNUM=34213
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=417692113
UDEV [417.693715] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
SEQNUM=34217
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=417692113
Wired Connection: connection profile changed
~~~
We see from the above that NetworkManager now added a new connection profile to the default connection.
This is what usually happens, though. In the results of test2, we can see that the exact same sequence happens 5 times before [0], without any negative consequences:
~~~
UDEV [354.543365] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v23
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=33477
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=354538388
UDEV [354.543844] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=net1
SEQNUM=33478
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=354543742
UDEV [354.545042] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
SEQNUM=33482
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=354543742
~~~
My assumption is that there's a race condition with udev and the "move" event (/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1) does not add NM_UNMANAGED=1, contrary to the add event of /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23. Under specific conditions (which I cannot explain), a race occurs, and device path /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 is then added without the environment variable NM_UNMANAGED=1
During our tests, once the issue reproduced, we could see the following info for the path - a retrigger would add the correct environment variable and value (the below is from an earlier test with a different virtual function):
~~~
[root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/net1
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_NAME_MAC=enx<redacted mac>
E: ID_NET_NAME_PATH=enp59s0f0v22
E: ID_NET_NAME_SLOT=ens1f0v22
E: ID_PATH=pci-0000:3b:03.0
E: ID_PATH_TAG=pci-0000_3b_03_0
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=73
E: INTERFACE=ens1f0v22
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v22
E: TAGS=:systemd:
E: USEC_INITIALIZED=419436208
[root@worker-0 ~]# ls /etc/udev/rules.d/
10-nm-unmanaged.rules 70-persistent-ipoib.rules
[root@worker-0 ~]# udevadm control --reload-rules && udevadm trigger
[root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_NAME_MAC=enx<redacted mac>
E: ID_NET_NAME_PATH=enp59s0f0v22
E: ID_NET_NAME_SLOT=ens1f0v22
E: ID_PATH=pci-0000:3b:03.0
E: ID_PATH_TAG=pci-0000_3b_03_0
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=73
E: INTERFACE=ens1f0v22
E: NM_UNMANAGED=1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v22
E: TAGS=:systemd:
E: USEC_INITIALIZED=419436208
~~~
As a matter of fact, the `net1` device path did not exist (output is from yet another test, after another fail):
~~~
[root@worker-0 ~]# ip a ls dev ens1f0v21
64: ens1f0v21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether <redacted mac> brd ff:ff:ff:ff:ff:ff
inet 192.168.123.174/24 brd 192.168.123.255 scope global dynamic noprefixroute ens1f0v21
valid_lft 690895sec preferred_lft 690895sec
inet6 ... scope link noprefixroute
valid_lft forever preferred_lft forever
[root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/net1
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_NAME_MAC=enx<redacted mac>
E: ID_NET_NAME_PATH=enp59s0f0v21
E: ID_NET_NAME_SLOT=ens1f0v21
E: ID_PATH=pci-0000:3b:02.7
E: ID_PATH_TAG=pci-0000_3b_02_7
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=64
E: INTERFACE=ens1f0v21
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v21
E: TAGS=:systemd:
E: USEC_INITIALIZED=1221371446
[root@worker-0 ~]# udevadm test -a add /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
calling: test
version 239
This program is for debugging only, it does not run any program
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.
Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Reading rules file: /usr/lib/udev/rules.d/01-md-raid-creating.rules
Reading rules file: /usr/lib/udev/rules.d/10-dm.rules
Reading rules file: /etc/udev/rules.d/10-nm-unmanaged.rules
Reading rules file: /usr/lib/udev/rules.d/11-dm-lvm.rules
Reading rules file: /usr/lib/udev/rules.d/11-dm-mpath.rules
Reading rules file: /usr/lib/udev/rules.d/11-dm-parts.rules
Reading rules file: /usr/lib/udev/rules.d/13-dm-disk.rules
Reading rules file: /usr/lib/udev/rules.d/40-elevator.rules
Reading rules file: /usr/lib/udev/rules.d/40-redhat.rules
Reading rules file: /usr/lib/udev/rules.d/40-usb-blacklist.rules
Reading rules file: /usr/lib/udev/rules.d/50-udev-default.rules
Reading rules file: /usr/lib/udev/rules.d/60-alias-kmsg.rules
Reading rules file: /usr/lib/udev/rules.d/60-block.rules
Reading rules file: /usr/lib/udev/rules.d/60-cdrom_id.rules
Reading rules file: /usr/lib/udev/rules.d/60-drm.rules
Reading rules file: /usr/lib/udev/rules.d/60-evdev.rules
Reading rules file: /usr/lib/udev/rules.d/60-fido-id.rules
Reading rules file: /usr/lib/udev/rules.d/60-input-id.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-alsa.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-input.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-storage-tape.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-storage.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-v4l.rules
Reading rules file: /usr/lib/udev/rules.d/60-raw.rules
Reading rules file: /usr/lib/udev/rules.d/60-rdma-ndd.rules
Reading rules file: /usr/lib/udev/rules.d/60-rdma-persistent-naming.rules
Reading rules file: /usr/lib/udev/rules.d/60-sensor.rules
Reading rules file: /usr/lib/udev/rules.d/60-serial.rules
Reading rules file: /usr/lib/udev/rules.d/60-srp_daemon.rules
Reading rules file: /usr/lib/udev/rules.d/60-tpm-udev.rules
Reading rules file: /usr/lib/udev/rules.d/61-scsi-sg3_id.rules
Reading rules file: /usr/lib/udev/rules.d/62-multipath.rules
Reading rules file: /usr/lib/udev/rules.d/63-fc-wwpn-id.rules
Reading rules file: /usr/lib/udev/rules.d/63-md-raid-arrays.rules
Reading rules file: /usr/lib/udev/rules.d/63-scsi-sg3_symlink.rules
Reading rules file: /usr/lib/udev/rules.d/64-btrfs.rules
Reading rules file: /usr/lib/udev/rules.d/64-md-raid-assembly.rules
Reading rules file: /usr/lib/udev/rules.d/65-gce-disk-naming.rules
Reading rules file: /usr/lib/udev/rules.d/65-md-incremental.rules
Reading rules file: /usr/lib/udev/rules.d/66-azure-storage.rules
Reading rules file: /usr/lib/udev/rules.d/66-kpartx.rules
Reading rules file: /usr/lib/udev/rules.d/68-azure-sriov-nm-unmanaged.rules
Reading rules file: /usr/lib/udev/rules.d/68-del-part-nodes.rules
Reading rules file: /usr/lib/udev/rules.d/69-dm-lvm-metad.rules
Reading rules file: /usr/lib/udev/rules.d/69-md-clustered-confirm-device.rules
Reading rules file: /usr/lib/udev/rules.d/70-joystick.rules
Reading rules file: /usr/lib/udev/rules.d/70-mouse.rules
Reading rules file: /etc/udev/rules.d/70-persistent-ipoib.rules
Reading rules file: /usr/lib/udev/rules.d/70-power-switch.rules
Reading rules file: /usr/lib/udev/rules.d/70-touchpad.rules
Reading rules file: /usr/lib/udev/rules.d/70-uaccess.rules
Reading rules file: /usr/lib/udev/rules.d/71-seat.rules
Reading rules file: /usr/lib/udev/rules.d/73-idrac.rules
Reading rules file: /usr/lib/udev/rules.d/73-seat-late.rules
Reading rules file: /usr/lib/udev/rules.d/75-net-description.rules
Reading rules file: /usr/lib/udev/rules.d/75-probe_mtd.rules
Reading rules file: /usr/lib/udev/rules.d/75-rdma-description.rules
Reading rules file: /usr/lib/udev/rules.d/78-sound-card.rules
Reading rules file: /usr/lib/udev/rules.d/80-drivers.rules
Reading rules file: /usr/lib/udev/rules.d/80-net-setup-link.rules
Reading rules file: /usr/lib/udev/rules.d/84-nm-drivers.rules
Reading rules file: /usr/lib/udev/rules.d/85-nm-unmanaged.rules
Reading rules file: /usr/lib/udev/rules.d/90-coreos-device-mapper.rules
Reading rules file: /usr/lib/udev/rules.d/90-iwpmd.rules
Reading rules file: /usr/lib/udev/rules.d/90-nm-thunderbolt.rules
Reading rules file: /usr/lib/udev/rules.d/90-rdma-hw-modules.rules
Reading rules file: /usr/lib/udev/rules.d/90-rdma-ulp-modules.rules
Reading rules file: /usr/lib/udev/rules.d/90-rdma-umad.rules
Reading rules file: /usr/lib/udev/rules.d/90-vconsole.rules
Reading rules file: /usr/lib/udev/rules.d/91-drm-modeset.rules
Reading rules file: /usr/lib/udev/rules.d/91-vfio.rules
Reading rules file: /usr/lib/udev/rules.d/95-dm-notify.rules
Reading rules file: /usr/lib/udev/rules.d/98-rdma.rules
Reading rules file: /usr/lib/udev/rules.d/99-azure-product-uuid.rules
Reading rules file: /usr/lib/udev/rules.d/99-systemd.rules
Reading rules file: /usr/lib/udev/rules.d/99-vmware-scsi-udev.rules
rules contain 49152 bytes tokens (4096 * 12 bytes), 23140 bytes strings
3345 strings (44695 bytes), 2318 de-duplicated (22583 bytes), 1028 trie nodes used
IMPORT builtin 'net_id' /usr/lib/udev/rules.d/75-net-description.rules:6
IMPORT builtin 'hwdb' /usr/lib/udev/rules.d/75-net-description.rules:12
IMPORT builtin 'path_id' /usr/lib/udev/rules.d/80-net-setup-link.rules:5
IMPORT builtin 'net_setup_link' /usr/lib/udev/rules.d/80-net-setup-link.rules:9
Config file /usr/lib/systemd/network/99-default.link applies to device ens1f0v21
link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
NAME 'ens1f0v21' /usr/lib/udev/rules.d/80-net-setup-link.rules:11
RUN 'kmod load mlx5_ib' /usr/lib/udev/rules.d/90-rdma-hw-modules.rules:20
RUN '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/$name --prefix=/net/ipv4/neigh/$name --prefix=/net/ipv6/conf/$name --prefix=/net/ipv6/neigh/$name' /usr/lib/udev/rules.d/99-systemd.rules:60
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v21
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v21
ID_NET_NAME_SLOT=ens1f0v21
ID_PATH=pci-0000:3b:02.7
ID_PATH_TAG=pci-0000_3b_02_7
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=64
INTERFACE=ens1f0v21
NM_UNMANAGED=1
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v21
TAGS=:systemd:
USEC_INITIALIZED=1221371446
run: 'kmod load mlx5_ib'
run: '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/ens1f0v21 --prefix=/net/ipv4/neigh/ens1f0v21 --prefix=/net/ipv6/conf/ens1f0v21 --prefix=/net/ipv6/neigh/ens1f0v21'
Unload module index
Unloaded link configuration context.
[root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_DRIVER=mlx5_core
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: ID_NET_NAME=ens1f0v21
E: ID_NET_NAME_MAC=enx<redacted mac>
E: ID_NET_NAME_PATH=enp59s0f0v21
E: ID_NET_NAME_SLOT=ens1f0v21
E: ID_PATH=pci-0000:3b:02.7
E: ID_PATH_TAG=pci-0000_3b_02_7
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=64
E: INTERFACE=ens1f0v21
E: NM_UNMANAGED=1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v21
E: TAGS=:systemd:
E: USEC_INITIALIZED=1221371446
[root@worker-0 ~]# ip a ls dev ens1f0v21
64: ens1f0v21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether <redacted mac> brd ff:ff:ff:ff:ff:ff
inet 192.168.123.174/24 brd 192.168.123.255 scope global dynamic noprefixroute ens1f0v21
valid_lft 690795sec preferred_lft 690795sec
inet6 ... scope link noprefixroute
valid_lft forever preferred_lft forever
[root@worker-0 ~]# nmcli conn
NAME UUID TYPE DEVICE
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f0v21
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens1f1
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f1
ovs-if-br-ex 5090ed3a-6dca-44b6-a31c-6c6539132c23 ovs-interface br-ex
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet eno2np1
Wired Connection 5f123cb7-3abb-477a-8c29-55fa809f19b4 ethernet ens7f0
br-ex 45858fce-90a1-41d6-ae89-3437bac40a76 ovs-bridge br-ex
ovs-if-phys0 7fd65994-e486-4f47-a18f-0f5e6e02c235 ethernet ens1f0
ovs-port-br-ex 214c5ade-fed8-4b3c-84f7-86e11397d6bb ovs-port br-ex
ovs-port-phys0 9ec1d966-97a1-4334-a3ed-0400ec39ef1a ovs-port ens1f0
[root@worker-0 ~]#
[root@worker-0 ~]#
[root@worker-0 ~]#
[root@worker-0 ~]# udevadm info --path /devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/net1
syspath not found
~~~
=======================
[0]
[akaris@linux 02786983]$ cat udevseq1.txt
UDEV [417.691883] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v23
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=34212
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=417686148
UDEV [417.692216] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=net1
SEQNUM=34213
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=417692113
UDEV [417.693715] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
SEQNUM=34217
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=417692113
Wired Connection: connection profile changed
[akaris@linux 02786983]$ cat udevseq2.txt
UDEV [354.543365] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v23
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=33477
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=354538388
UDEV [354.543844] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=net1
SEQNUM=33478
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=354543742
UDEV [354.545042] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
SEQNUM=33482
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=354543742
[akaris@linux 02786983]$ diff udevseq1.txt udevseq2.txt
1c1
< UDEV [417.691883] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
---
> UDEV [354.543365] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
23c23
< SEQNUM=34212
---
> SEQNUM=33477
27c27
< USEC_INITIALIZED=417686148
---
> USEC_INITIALIZED=354538388
29c29
< UDEV [417.692216] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
---
> UDEV [354.543844] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
44c44
< SEQNUM=34213
---
> SEQNUM=33478
48c48
< USEC_INITIALIZED=417692113
---
> USEC_INITIALIZED=354543742
50c50
< UDEV [417.693715] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
---
> UDEV [354.545042] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
68c68
< SEQNUM=34217
---
> SEQNUM=33482
72,74c72
< USEC_INITIALIZED=417692113
<
< Wired Connection: connection profile changed
---
> USEC_INITIALIZED=354543742
====================
How to work around the event with the SR-IOV network operator and a udev rule tweak:
* match on `ACTION=="add|change|move"`:
~~~
[root@worker-0 ~]# cat /etc/udev/rules.d/99-test.rules
ACTION=="add|change|move", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1"
~~~
This will add NM_UNMANAGED=1 to the udev "move" events, too:
~~~
UDEV [14134.602264] add /devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/ens1f1v13 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/ens1f1v13
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f1v13
ID_NET_NAME_MAC=enx36bb05ba0438
ID_NET_NAME_PATH=enp59s0f1v13
ID_NET_NAME_SLOT=ens1f1v13
ID_PATH=pci-0000:3b:05.5
ID_PATH_TAG=pci-0000_3b_05_5
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=114
INTERFACE=ens1f1v13
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=190826
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f1v13
TAGS=:systemd:
USEC_INITIALIZED=14134596165
UDEV [14134.602702] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:05.5
ID_PATH_TAG=pci-0000_3b_05_5
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=114
INTERFACE=net1
NM_UNMANAGED=1
SEQNUM=190827
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=14134602583
UDEV [14134.604382] move /devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx36bb05ba0438
ID_NET_NAME_PATH=enp59s0f1v13
ID_NET_NAME_SLOT=ens1f1v13
ID_PATH=pci-0000:3b:05.5
ID_PATH_TAG=pci-0000_3b_05_5
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=114
INTERFACE=ens1f1v13
NM_UNMANAGED=1
SEQNUM=190831
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f1v13
TAGS=:systemd:
USEC_INITIALIZED=14134602583
~~~
And in customer tests, the issue could not be reproduced.
Do we have a target 4.6 Z stream for this? Verified this on 4.7.0-202012120244.p0
sudo cat /etc/udev/rules.d/10-nm-unmanaged.rules
ACTION=="add|change|move", ATTRS{device}=="0x1014|0x1016|0x1018|0x101c|0x154c", ENV{NM_UNMANAGED}="1"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |