Bug 1896866 - File /etc/NetworkManager/system-connections/default_connection.nmconnection is incompatible with SR-IOV operator
Summary: File /etc/NetworkManager/system-connections/default_connection.nmconnection ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Federico Paolinelli
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1906723
TreeView+ depends on / blocked
 
Reported: 2020-11-11 17:47 UTC by Andreas Karis
Modified: 2021-11-30 09:26 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:32:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator issues 414 0 None closed Race condition in udev makes NM ignore rules in /host/etc/udev/rules.d/10-nm-unmanaged.rules 2021-02-17 13:48:20 UTC
Github openshift sriov-network-operator pull 415 0 None closed Bug 1896866: Work around udev race condition related to NM_UNMANAGED=1 2021-02-17 13:48:21 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:33:22 UTC

Description Andreas Karis 2020-11-11 17:47:11 UTC
Description of problem:

File /etc/NetworkManager/system-connections/default_connection.nmconnection  is incompatible with SR-IOV operator

I assume that /etc/NetworkManager/system-connections/default_connection.nmconnection is pushed by the IPI installer. Contents:
~~~
[connection]
id=Wired Connection
uuid=5f123cb7-3abb-477a-8c29-55fa809f19b4
type=ethernet
multi-connect=3
permissions=

[ethernet]
mac-address-blacklist=

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=eui64
dns-search=
method=auto

[proxy]
~~~

The problem with this configuration is that we are blindly assuming that any interface on the host wants to run DHCP. But that's not the case and creates issues with the SR-IOV operator, but I'd guess that it causes issues with many other things.

In the example of the SR-IOV operator, if a VF is bound to the kernel driver, then assigned to a pod, and then unbound from the pod, NetworkManager will manage the VF and run DHCP on that VF. If the VF belongs to the machine network, we will now have the same subnet 2x on the worker node: once on the PF, and another time on the VF.

I do not believe that this is an issue with the SR-IOV operator or device plugin. Instead, I think that the approach of telling NetworkManager to get DHCP leases on all interfaces is wrong for any day 2 operation. Granted, this approach makes sense for the installation stage. But once the node was provisioned, we should be able to figure out which interface is the machine network interface. And the default_connection.nmconnection should be removed and replaced with more specific configuration.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Dan Small 2020-11-12 16:30:39 UTC
LATEST COMMENT FROM CUSTOMER - JOHN WONG:

Alright. .so I've ran a subset of your commands that I think applied to this comment..


Before reboot or deleting that /etc/NetworkManager/system-connections/default_connection.nmconnection file. I think there weren't as many from what you got from you analysis because ~22 hours has passed.


[core@worker-0 ~]$ sudo nmcli con show --active
NAME              UUID                                  TYPE           DEVICE    
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens1f0v12  <--- I think this is the problem VF.
ovs-if-br-ex      5090ed3a-6dca-44b6-a31c-6c6539132c23  ovs-interface  br-ex     
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       eno2np1   
br-ex             45858fce-90a1-41d6-ae89-3437bac40a76  ovs-bridge     br-ex     
ovs-if-phys0      7fd65994-e486-4f47-a18f-0f5e6e02c235  ethernet       ens1f0    
ovs-port-br-ex    214c5ade-fed8-4b3c-84f7-86e11397d6bb  ovs-port       br-ex     
ovs-port-phys0    9ec1d966-97a1-4334-a3ed-0400ec39ef1a  ovs-port       ens1f0    

[core@worker-0 ~]$ ip -d address | less | grep -i 0v
53: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
54: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
55: ens1f0v10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
56: ens1f0v11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
57: ens1f0v12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.144.175.175/24 brd 10.144.175.255 scope global dynamic noprefixroute ens1f0v12
58: ens1f0v13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
59: ens1f0v14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
60: ens1f0v15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
61: ens1f0v16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
62: ens1f0v17: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
63: ens1f0v18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
64: ens1f0v19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
65: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
66: ens1f0v20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
67: ens1f0v21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
68: ens1f0v22: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
69: ens1f0v23: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
70: ens1f0v24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
71: ens1f0v25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
72: ens1f0v26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
73: ens1f0v27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
74: ens1f0v28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
75: ens1f0v29: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
76: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
77: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
78: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
79: ens1f0v6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
80: ens1f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
81: ens1f0v8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
82: ens1f0v9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
I deleted /etc/NetworkManager/system-connections/default_connection.nmconnection then reboot. After waiting for about 15 minutes, it eventually got back in the same state.


[core@worker-0 ~]$ sudo nmcli con show --active
NAME                UUID                                  TYPE           DEVICE   
Wired connection 3  e3c41ffa-b95d-33a9-8c01-581070bbbac1  ethernet       ens1f1   
Wired connection 4  64eb35e9-4974-3925-baee-d8984a8d6d3a  ethernet       ens7f0   
Wired connection 5  548649c0-0828-3bed-9a78-a9d7a6591ade  ethernet       ens7f1   
Wired connection 6  b4a780af-c1ff-3e6d-9a98-8a0f688473c8  ethernet       ens1f0v9   <----- Same issue but new VF and different UUID
ovs-if-br-ex        5090ed3a-6dca-44b6-a31c-6c6539132c23  ovs-interface  br-ex    
Wired connection 2  b04ab6fd-e38c-3d96-8dd2-62d56aa82e37  ethernet       eno2np1  
br-ex               45858fce-90a1-41d6-ae89-3437bac40a76  ovs-bridge     br-ex    
ovs-if-phys0        7fd65994-e486-4f47-a18f-0f5e6e02c235  ethernet       ens1f0   
ovs-port-br-ex      214c5ade-fed8-4b3c-84f7-86e11397d6bb  ovs-port       br-ex    
ovs-port-phys0      9ec1d966-97a1-4334-a3ed-0400ec39ef1a  ovs-port       ens1f0   
[core@worker-0 ~]$ ip -d address | grep -i 0v
53: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
54: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
55: ens1f0v10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
56: ens1f0v11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
57: ens1f0v12: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
58: ens1f0v13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
59: ens1f0v14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
60: ens1f0v15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
61: ens1f0v16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
62: ens1f0v17: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
63: ens1f0v18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
64: ens1f0v19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
65: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
66: ens1f0v20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
67: ens1f0v21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
68: ens1f0v22: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
69: ens1f0v23: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
70: ens1f0v24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
71: ens1f0v25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
72: ens1f0v26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
73: ens1f0v27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
74: ens1f0v28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
75: ens1f0v29: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
76: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
77: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
78: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
79: ens1f0v6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
80: ens1f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
81: ens1f0v8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
82: ens1f0v9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.144.175.176/24 brd 10.144.175.255 scope global dynamic noprefixroute ens1f0v9
Restored /etc/NetworkManager/system-connections/default_connection.nmconnection and made the following changed in your previous comment.


(...)
[ipv4]
dns-search=
method=disabled

[ipv6]
addr-gen-mode=eui64
dns-search=
method=disabled
(...)
Then rebooted


[core@worker-0 ~]$ sudo nmcli con show --active
NAME              UUID                                  TYPE           DEVICE   
ovs-if-br-ex      5090ed3a-6dca-44b6-a31c-6c6539132c23  ovs-interface  br-ex    
br-ex             45858fce-90a1-41d6-ae89-3437bac40a76  ovs-bridge     br-ex    
ovs-if-phys0      7fd65994-e486-4f47-a18f-0f5e6e02c235  ethernet       ens1f0   
ovs-port-br-ex    214c5ade-fed8-4b3c-84f7-86e11397d6bb  ovs-port       br-ex    
ovs-port-phys0    9ec1d966-97a1-4334-a3ed-0400ec39ef1a  ovs-port       ens1f0   
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       eno2np1  
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens1f1   
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens7f0   
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens1f0v6  <--- Still seeing this but I don't see an IP on the VF (maybe I got lucky on this lottery?)
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens7f1   
[core@worker-0 ~]$ ip -d address | grep -i 0v
54: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
55: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
56: ens1f0v10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
57: ens1f0v11: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
58: ens1f0v12: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
59: ens1f0v13: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
60: ens1f0v14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
61: ens1f0v15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
62: ens1f0v16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
63: ens1f0v17: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
64: ens1f0v18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
65: ens1f0v19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
66: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
67: ens1f0v20: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
68: ens1f0v21: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
69: ens1f0v22: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
70: ens1f0v23: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
71: ens1f0v24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
72: ens1f0v25: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
73: ens1f0v26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
74: ens1f0v27: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
75: ens1f0v28: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
76: ens1f0v29: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
77: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
78: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
79: ens1f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
80: ens1f0v6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
81: ens1f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
82: ens1f0v8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
83: ens1f0v9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
Something I did noticed is that I wasn't able to ssh into worker-0 on another interface (this interface was used during install but I've kept it there for situations like this where I lost connectivity on the baremetal network


[kni@provisioner ~]$ ssh core.0.29
ssh: connect to host 172.22.0.29 port 22: No route to host

Comment 4 Karim Boumedhel 2020-11-18 09:22:37 UTC
One approach that could be leveraged would be to create though a machine config this network manager conf (say in /etc/NetworkManager/conf.d/99-nodefault.conf ):

[main]
no-auto-default=*

Which as per Network manager documentation does the following:

Specify devices for which NetworkManager shouldn't create default wired connection (Auto eth0). By default, NetworkManager creates a temporary wired connection for any Ethernet device that is managed and doesn't have a connection configured. List a device in this option to inhibit creating the default connection for the device. May have the special value * to apply to all devices.

When the default wired connection is deleted or saved to a new persistent connection by a plugin, the device is added to a list in the file /var/lib/NetworkManager/no-auto-default.state to prevent creating the default connection for that device again.

Comment 5 Andreas Karis 2020-11-18 09:26:51 UTC
We tried 3 options to work around this for the time being:

i) Deleting /etc/NetworkManager/system-connections/default_connection.nmconnection

This option does not work. The VFs will get an IP address lease from the DHCP server after they switch from the pods' to the host's namespace

ii) Set method=disabled in all sections in /etc/NetworkManager/system-connections/default_connection.nmconnection

This option **does** work as a workaround. When VFs switch from the pods' to the host's namespace, they do not obtain a DHCP lease.

iii) udev workaround with with method=auto in /etc/NetworkManager/system-connections/default_connection.nmconnection
~~~
cat <<'EOF' > /etc/udev/rules.d/99-vfs-unmanaged.rules
ENV{PCI_ID}=="8086:10ED", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by PCI ID
ENV{ID_NET_DRIVER}=="ixgbevf", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by driver name
ENV{PCI_ID}=="15B3:1018", ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID
ENV{ID_VENDOR_ID}==0x15b3, ENV{ID_MODEL_ID}==0x1018, ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID
EOF
udevadm control --reload-rules && udevadm trigger
~~~

Does not work at the moment. Even though we see the correct NM_UNMANAGED annotation:
~~~
[core@worker-0 ~]$ udevadm info --path=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
(...)
E: NM_UNMANAGED=1
(...)
~~~

I could get this to work in a lab of mine with Intel cards, though. So maybe it's something specific to the Mellanox.

=========================================

I'm not sure where we have to route this BZ. If it's RHCOS, or should be routed to the SR-IOV operator devs, or IPI installation. In theory, we "only" have to find a way to tell NetworkManager to keep its fingers away from managing any VFs.

Comment 6 Andreas Karis 2020-11-18 09:27:35 UTC
Thanks Karim, we'll try that!

Comment 7 Steven Hardy 2020-11-18 10:17:20 UTC
> I assume that /etc/NetworkManager/system-connections/default_connection.nmconnection is pushed by the IPI installer

It's not created by the installer, it's a default connection created by NetworkManager IIUC

As Karim mentioned it may be possible to override this behavior by injecting some additional config, either a MachineConfig to adjust the NM config can be applied post install as part of the SR-IOV configuration, or it can be provided at install time via the installer `create manifests` step.

Reassigning to the CNF team to decide how to proceed.

Comment 8 Federico Paolinelli 2020-11-18 10:41:43 UTC
CNF Platform Validation is the component for the cnf-tests suite, so it's probably not the right component to move this bz to.

In any case, the SR-IOV operator already has the logic in place for creating the udev rule, which was 
https://github.com/openshift/sriov-network-operator/blob/5cab948617a7fefaa58e10280b1c6a0b3872ab89/pkg%2Fdaemon%2Fdaemon.go#L729

It only adds rules for supported devices though, as per https://github.com/openshift/sriov-network-operator/blob/b08f8433bfc7fbdb9a9175ee6ec8a95c12b791f8/api%2Fv1%2Fhelper.go#L35


@Andreas, mind checking the content of /host/etc/udev/rules.d/10-nm-unmanaged.rules on the host to see if the sriov-operator created it (and also, checking if the card is one among those above) ?


Moving to SR-IOV operator in the meanwhile, if we find that the udev rules are not working, we'll move the bz again.

Comment 9 Andreas Karis 2020-11-19 09:58:06 UTC
There's a udev rule pushed by the SR-IOV operator, as well as the one that I pushed, on the customer system. And the VF device is listed as NM_UNMANAGED:

[core@worker-0 ~]$ sudo cat /etc/udev/rules.d/10-nm-unmanaged.rules
ACTION=="add|change", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1"
[core@worker-0 ~]$ sudo cat /etc/udev/rules.d/99-vfs-unmanaged.rules
ENV{PCI_ID}=="8086:10ED", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by PCI ID
ENV{ID_NET_DRIVER}=="ixgbevf", ENV{NM_UNMANAGED}="1" # Disable ixgbevfs by driver name
ENV{PCI_ID}=="15B3:1018", ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID
ENV{ID_VENDOR_ID}==0x15b3, ENV{ID_MODEL_ID}==0x1018, ENV{NM_UNMANAGED}="1" # disable Mellanox CX-5 VFs by PCI ID


[core@worker-0 ~]$ udevadm info --path=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1 | grep NM
E: NM_UNMANAGED=1

[core@worker-0 ~]$ udevadm info --path=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.3/net/ens1f0v1
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_DRIVER=mlx5_core
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: ID_NET_NAME=ens1f0v1
E: ID_NET_NAME_MAC=(redacted)
E: ID_NET_NAME_PATH=enp59s0f0v1
E: ID_NET_NAME_SLOT=ens1f0v1
E: ID_PATH=pci-0000:3b:00.3
E: ID_PATH_TAG=pci-0000_3b_00_3
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=57
E: INTERFACE=ens1f0v1
E: NM_UNMANAGED=1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v1 /sys/subsystem/net/devices/ens1f0v1
E: TAGS=:systemd:
E: USEC_INITIALIZED=175104174

Yet, network manager manages the VFs

Comment 11 Andreas Karis 2020-11-19 10:01:49 UTC
The following also does not help):
~~~
[core@worker-0 ~]$ sudo cat /etc/NetworkManager/conf.d/99-nodefault.conf 
[main]
no-auto-default=*
~~~

(see the aforementioned private comment)

-----------

#c8 actually clarified things for me a bit more. I'll go into a remote session with the customer to see what's up with NetworkManager and why it ignored the NM_UNMANAGED

Comment 19 Andreas Karis 2020-11-21 11:56:00 UTC
## Issue

When scaling up and down a deployment with SR-IOV interfaces attaches to its pods, the worker node at some point ends up running DHCP on the VFs when they move back to netns 1.
~~~
[root@worker-0 ~]# ip a ls dev ens1f0v21
64: ens1f0v21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether <redacted mac> brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.174/24 brd 192.168.123.255 scope global dynamic noprefixroute ens1f0v21
       valid_lft 690795sec preferred_lft 690795sec
    inet6 ... scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@worker-0 ~]# nmcli conn
NAME              UUID                                  TYPE           DEVICE    
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens1f0v21 
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens1f1    
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens7f1    
ovs-if-br-ex      5090ed3a-6dca-44b6-a31c-6c6539132c23  ovs-interface  br-ex     
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       eno2np1   
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens7f0    
br-ex             45858fce-90a1-41d6-ae89-3437bac40a76  ovs-bridge     br-ex     
ovs-if-phys0      7fd65994-e486-4f47-a18f-0f5e6e02c235  ethernet       ens1f0    
ovs-port-br-ex    214c5ade-fed8-4b3c-84f7-86e11397d6bb  ovs-port       br-ex     
ovs-port-phys0    9ec1d966-97a1-4334-a3ed-0400ec39ef1a  ovs-port       ens1f0
~~~

This is particularly problematic if the VFs belong to the PF of the machine network interface, as they will get an additional lease on the machine network and thus will break node networking.

## Test setup

### Base setup

Use the setup from the following issue description:

OpenShift 4.6. The following directory is used
https://mirror.openshift.com/pub/openshift-v4/clients/ocp/candidate-4.6/

A Mellanox MT27800 is installed on the worker nodes. One of the Mellanox ports is used as the baremetal network during install.

I am able to cause connectivity issues when scaling the a deployment up to 3 and down to 0 repeatedly about 3 or 4 times.

I see a connectivity problem when I see the following node statuses
[kni@provisioner ~]$ oc get nodes
NAME                         STATUS     ROLES    AGE   VERSION
master-0.ocp3.example.com   Ready      master   7d    v1.19.0+d59ce34
master-1.ocp3.example.com   Ready      master   7d    v1.19.0+d59ce34
master-2.ocp3.example.com   Ready      master   7d    v1.19.0+d59ce34
worker-0.ocp3.example.com   NotReady   worker   7d    v1.19.0+d59ce34
worker-1.ocp3.example.com   Ready      worker   7d    v1.19.0+d59ce34

I will need to reboot worker-0 to get to back to a Ready state.

The deployment yaml I used is below
apiVersion: apps/v1
kind: Deployment
metadata:
  name: example
  namespace: user-dev
spec:
  selector:
    matchLabels:
      app: ubuntu-example
  replicas: 1
  template:
    metadata:
      labels:
        app: ubuntu-example
      annotations:
        k8s.v1.cni.cncf.io/networks: >-
          user-dev/user-w0-ens1f0-mlx5-netdev-vxlan
    spec:
      containers:
        - name: ubuntu-example
          image: ubuntu
          command:
            - sleep
            - infinity
      nodeSelector:
        kubernetes.io/hostname: worker-0.ocp3.example.com

The SRIOV Network I used is below
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  annotations:
    operator.sriovnetwork.openshift.io/last-network-namespace: user-dev
  selfLink: >-
    /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworks/user-w0-ens1f0-mlx5-netdev-vxlan
  resourceVersion: '3294180'
  name: user-w0-ens1f0-mlx5-netdev-vxlan
  uid: f4af9200-015e-4ffb-8dae-c219c542fca7
  creationTimestamp: '2020-10-21T08:10:28Z'
  generation: 7
  managedFields:
    - apiVersion: sriovnetwork.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:operator.sriovnetwork.openshift.io/last-network-namespace': {}
          'f:finalizers':
            .: {}
            'v:"netattdef.finalizers.sriovnetwork.openshift.io"': {}
        'f:status': {}
      manager: sriov-network-operator
      operation: Update
      time: '2020-10-21T08:10:28Z'
    - apiVersion: sriovnetwork.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          .: {}
          'f:capabilities': {}
          'f:ipam': {}
          'f:networkNamespace': {}
          'f:resourceName': {}
          'f:spoofChk': {}
          'f:trust': {}
      manager: Mozilla
      operation: Update
      time: '2020-10-27T07:47:37Z'
  namespace: openshift-sriov-network-operator
  finalizers:
    - netattdef.finalizers.sriovnetwork.openshift.io
spec:
  capabilities: '{"mac": true, "ips": true}'
  ipam: >-
    { "type": "host-local", "subnet": "192.168.123.0/24", "rangeStart":
    "192.168.123.159", "rangeEnd": "192.168.123.159" }
  networkNamespace: user-dev
  resourceName: SriovW0Ens1f0Mlx5NetdevPolicy
  spoofChk: 'off'
  trust: 'on'

The SRIOV Policy for this SRIOV Network is
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"sriovnetwork.openshift.io/v1","kind":"SriovNetworkNodePolicy","metadata":{"annotations":{},"name":"sriov-w0-ens1f0-mlx5-netdev-policy","namespace":"openshift-sriov-network-operator"},"spec":{"deviceType":"netdevice","isRdma":true,"nicSelector":{"pfNames":["ens1f0"]},"nodeSelector":{"kubernetes.io/hostname":"worker-0.ocp3.example.com"},"numVfs":30,"priority":99,"resourceName":"SriovW0Ens1f0Mlx5NetdevPolicy"}}
  selfLink: >-
    /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodepolicies/sriov-w0-ens1f0-mlx5-netdev-policy
  resourceVersion: '481339'
  name: sriov-w0-ens1f0-mlx5-netdev-policy
  uid: fe1c3aaf-9abc-4aeb-a607-3a8cc22f471d
  creationTimestamp: '2020-10-21T06:08:50Z'
  generation: 1
  managedFields:
    - apiVersion: sriovnetwork.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:kubectl.kubernetes.io/last-applied-configuration': {}
        'f:spec':
          .: {}
          'f:deviceType': {}
          'f:isRdma': {}
          'f:nicSelector':
            .: {}
            'f:pfNames': {}
          'f:nodeSelector':
            .: {}
            'f:kubernetes.io/hostname': {}
          'f:numVfs': {}
          'f:priority': {}
          'f:resourceName': {}
      manager: oc
      operation: Update
      time: '2020-10-21T06:08:50Z'
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  isRdma: true
  linkType: eth
  nicSelector:
    pfNames:
      - ens1f0
  nodeSelector:
    kubernetes.io/hostname: worker-0.ocp3.example.com
  numVfs: 30
  priority: 99
  resourceName: SriovW0Ens1f0Mlx5NetdevPolicy

$ oc get net-attach-def -n user-dev -o yaml    
apiVersion: v1
items:
- apiVersion: k8s.cni.cncf.io/v1
  kind: NetworkAttachmentDefinition
  metadata:
    annotations:
      k8s.v1.cni.cncf.io/resourceName: openshift.io/SriovW0Ens1f0Mlx5NetdevPolicy
    creationTimestamp: "2020-10-21T08:07:57Z"
    generation: 1
    managedFields:
    - apiVersion: k8s.cni.cncf.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:k8s.v1.cni.cncf.io/resourceName: {}
        f:spec:
          .: {}
          f:config: {}
      manager: sriov-network-operator
      operation: Update
      time: "2020-10-21T08:07:57Z"
    name: user-w0-ens1f0-mlx5-netdev-3805
    namespace: user-dev
    resourceVersion: "526756"
    selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/user-dev/network-attachment-definitions/user-w0-ens1f0-mlx5-netdev-3805
    uid: 47a81653-4c1f-4636-b8ed-18b51b578e0a
  spec:
    config: '{ "cniVersion":"0.3.1", "name":"user-w0-ens1f0-mlx5-netdev-3805", "type":"sriov","vlan":3805,"spoofchk":"off","trust":"on","vlanQoS":0,"capabilities":{"mac":
      true, "ips": true},"ipam":{"type":"host-local","subnet":"6.6.6.0/24","rangeStart":"6.6.6.185","rangeEnd":"6.6.6.200"}
      }'
- apiVersion: k8s.cni.cncf.io/v1
  kind: NetworkAttachmentDefinition
  metadata:
    annotations:
      k8s.v1.cni.cncf.io/resourceName: openshift.io/SriovW0Ens1f0Mlx5NetdevPolicy
    creationTimestamp: "2020-10-21T08:10:28Z"
    generation: 7
    managedFields:
    - apiVersion: k8s.cni.cncf.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:k8s.v1.cni.cncf.io/resourceName: {}
        f:spec:
          .: {}
          f:config: {}
      manager: sriov-network-operator
      operation: Update
      time: "2020-10-27T07:47:37Z"
    name: user-w0-ens1f0-mlx5-netdev-vxlan
    namespace: user-dev
    resourceVersion: "3294181"
    selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/user-dev/network-attachment-definitions/user-w0-ens1f0-mlx5-netdev-vxlan
    uid: 35361b4a-3fea-4389-9307-f3af81ed42d2
  spec:
    config: '{ "cniVersion":"0.3.1", "name":"user-w0-ens1f0-mlx5-netdev-vxlan", "type":"sriov","vlan":0,"spoofchk":"off","trust":"on","vlanQoS":0,"capabilities":{"mac":
      true, "ips": true},"ipam":{"type":"host-local","subnet":"192.168.123.0/24","rangeStart":"192.168.123.159","rangeEnd":"192.168.123.159"}
      }'
- apiVersion: k8s.cni.cncf.io/v1
  kind: NetworkAttachmentDefinition
  metadata:
    annotations:
      k8s.v1.cni.cncf.io/resourceName: openshift.io/SriovW0Ens1f1Mlx5NetdevPolicy
    creationTimestamp: "2020-10-21T08:08:51Z"
    generation: 1
    managedFields:
    - apiVersion: k8s.cni.cncf.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:k8s.v1.cni.cncf.io/resourceName: {}
        f:spec:
          .: {}
          f:config: {}
      manager: sriov-network-operator
      operation: Update
      time: "2020-10-21T08:08:51Z"
    name: user-w0-ens1f1-mlx5-netdev-3805
    namespace: user-dev
    resourceVersion: "527023"
    selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/user-dev/network-attachment-definitions/user-w0-ens1f1-mlx5-netdev-3805
    uid: e2fe5ad4-c610-45ef-a247-98e879e9bfa3
  spec:
    config: '{ "cniVersion":"0.3.1", "name":"user-w0-ens1f1-mlx5-netdev-3805", "type":"sriov","vlan":3805,"spoofchk":"off","trust":"on","vlanQoS":0,"capabilities":{"mac":
      true, "ips": true},"ipam":{"type":"host-local","subnet":"6.6.6.0/24","rangeStart":"6.6.6.185","rangeEnd":"6.6.6.200"}
      }'
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

### Force IPAM shortage

Create IPAM shortage in network, meaning all IPs on the network should be used. This is the easiest and most reliable way of recreating the issue, as the pods will go into a deletion/recreation loop which will cause repeated binding and unbinding of the same VF from/to the host network namespace from/to the pod namespace.

During reproducer testing, the pod will show something like the following when being scaled up:
~~~
ovisioner ~]$ oc describe pod example-f79f9f5bd-m5g9g | tail -n 30
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-ks6dg:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-ks6dg
    Optional:    false
  podnetinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
      metadata.annotations -> annotations
QoS Class:       BestEffort
Node-Selectors:  kubernetes.io/hostname=worker-0.ocp3.example.com
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               25s   default-scheduler  Successfully assigned user-dev/example-f79f9f5bd-m5g9g to
 worker-0.ocp3.example.com
  Normal   AddedInterface          23s   multus             Add eth0 [10.128.2.61/23]
  Warning  FailedCreatePodSandBox  23s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown de
sc = failed to create pod network sandbox k8s_example-f79f9f5bd-m5g9g_user-dev_061f6d34-ae31-46fd-9f5d-ef20981e6850_0
(9e7c6b1ba48b8cd47227a47ad3025c6b9f14f5d481698cd27ce898d5e6a67451): [user-dev/example-f79f9f5bd-m5g9g:user-w0-ens1f0
-mlx5-netdev-vxlan]: error adding container to network "user-w0-ens1f0-mlx5-netdev-vxlan": failed to set up IPAM plug
in type "host-local" from the device "ens1f0": failed to allocate for range 0: no IP addresses available in range set: 192.168.123.150-192.168.123.151
  Normal   AddedInterface          21s   multus             Add eth0 [10.128.2.61/23]
  Warning  FailedCreatePodSandBox  20s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_example-f79f9f5bd-m5g9g_user-dev_061f6d34-ae31-46fd-9f5d-ef20981e6850_0(303021dd4b332cee6008dd638d8333d43c4486a54a8c0c63e41fd2f5386a8c9b): [user-dev/example-f79f9f5bd-m5g9g:user-w0-ens1f0-mlx5-netdev-vxlan]: error adding container to network "user-w0-ens1f0-mlx5-netdev-vxlan": failed to set up IPAM plugin type "host-local" from the device "ens1f0": failed to allocate for range 0: no IP addresses available in range set: 192.168.123.150-192.168.123.151
  Normal   AddedInterface          8s    multus             Add eth0 [10.128.2.61/23]
  Warning  FailedCreatePodSandBox  7s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_example-f79f9f5bd-m5g9g_user-dev_061f6d34-ae31-46fd-9f5d-ef20981e6850_0(e5261a21b34e5d68ee8a17959bb1c4fb41ce9111d442fc48dc393f22ac36fd01): [user-dev/example-f79f9f5bd-m5g9g:user-w0-ens1f0-mlx5-netdev-vxlan]: error adding container to network "user-w0-ens1f0-mlx5-netdev-vxlan": failed to set up IPAM plugin type "host-local" from the device "ens1f0": failed to allocate for range 0: no IP addresses available in range set: 192.168.123.150-192.168.123.151
[kni@provisioner ~]$ 
~~~

## Test iterations

* Test 1: 
system defaults

* Test 2: 
~~~
cp /etc/udev/rules.d/10-nm-unmanaged.rules /etc/udev/rules.d/99-test.rules
~~~

* Test 3: 
~~~
[root@worker-0 ~]# cat /etc/udev/rules.d/99-test.rules 
ACTION=="add|change|move", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1"
~~~

## Test execution

### Clean starting state

On each test iteration, make sure to:
* reboot the node
* wait for all VFs to be up post reboot
* make sure the default NM connection exists and has IP method set to "auto" for IPv4/IPv6
* make sure the node is in READY state
* make sure that the `example` deployment is scaled to `0` ($ # oc scale deployment example --replicas=0)
* make sure that the following shows only valid connections and only one interface on the subnet:
~~~
Nmcli conn 
Ip a | grep 10.144
~~~

### Running the test

Open one CLI there you run on worker-0.ocp3.example.com:
~~~
Nmcli conn 
Ip a | grep 10.144
udevadm monitor -p & nmcli conn mon &
~~~

Then scale the deployment from 0 to 1:
~~~
$ # oc scale deployment example --replicas=1
~~~


=========================================================================================================================

Analysis of issue, from test2 results:

When the Mellanox interface is added to the kernel, we see 3 different events, one add and 2 moves:
~~~
UDEV  [417.691883] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v23
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=34212
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=417686148

UDEV  [417.692216] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=net1
SEQNUM=34213
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=417692113

UDEV  [417.693715] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
SEQNUM=34217
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=417692113

Wired Connection: connection profile changed
~~~

We see from the above that NetworkManager now added a new connection profile to the default connection.

This is what usually happens, though. In the results of test2, we can see that the exact same sequence happens 5 times before [0], without any negative consequences:
~~~
UDEV  [354.543365] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v23
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=33477
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=354538388

UDEV  [354.543844] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=net1
SEQNUM=33478
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=354543742

UDEV  [354.545042] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
SEQNUM=33482
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=354543742
~~~

My assumption is that there's a race condition with udev and the "move" event (/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1) does not add NM_UNMANAGED=1, contrary to the add event of /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23. Under specific conditions (which I cannot explain), a race occurs, and device path /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 is then added without the environment variable NM_UNMANAGED=1 

During our tests, once the issue reproduced, we could see the following info for the path - a retrigger would add the correct environment variable and value (the below is from an earlier test with a different virtual function):
~~~
[root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/net1
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_NAME_MAC=enx<redacted mac>
E: ID_NET_NAME_PATH=enp59s0f0v22
E: ID_NET_NAME_SLOT=ens1f0v22
E: ID_PATH=pci-0000:3b:03.0
E: ID_PATH_TAG=pci-0000_3b_03_0
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=73
E: INTERFACE=ens1f0v22
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v22
E: TAGS=:systemd:
E: USEC_INITIALIZED=419436208

[root@worker-0 ~]# ls /etc/udev/rules.d/
10-nm-unmanaged.rules  70-persistent-ipoib.rules
[root@worker-0 ~]# udevadm control --reload-rules && udevadm trigger
[root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.0/net/ens1f0v22
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_NAME_MAC=enx<redacted mac>
E: ID_NET_NAME_PATH=enp59s0f0v22
E: ID_NET_NAME_SLOT=ens1f0v22
E: ID_PATH=pci-0000:3b:03.0
E: ID_PATH_TAG=pci-0000_3b_03_0
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=73
E: INTERFACE=ens1f0v22
E: NM_UNMANAGED=1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v22
E: TAGS=:systemd:
E: USEC_INITIALIZED=419436208
~~~ 

As a matter of fact, the `net1` device path did not exist (output is from yet another test, after another fail):
~~~
[root@worker-0 ~]# ip a ls dev ens1f0v21
64: ens1f0v21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether <redacted mac> brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.174/24 brd 192.168.123.255 scope global dynamic noprefixroute ens1f0v21
       valid_lft 690895sec preferred_lft 690895sec
    inet6 ... scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/net1
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_NAME_MAC=enx<redacted mac>
E: ID_NET_NAME_PATH=enp59s0f0v21
E: ID_NET_NAME_SLOT=ens1f0v21
E: ID_PATH=pci-0000:3b:02.7
E: ID_PATH_TAG=pci-0000_3b_02_7
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=64
E: INTERFACE=ens1f0v21
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v21
E: TAGS=:systemd:
E: USEC_INITIALIZED=1221371446

[root@worker-0 ~]# udevadm test -a add /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
calling: test
version 239
This program is for debugging only, it does not run any program
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.

Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Reading rules file: /usr/lib/udev/rules.d/01-md-raid-creating.rules
Reading rules file: /usr/lib/udev/rules.d/10-dm.rules
Reading rules file: /etc/udev/rules.d/10-nm-unmanaged.rules
Reading rules file: /usr/lib/udev/rules.d/11-dm-lvm.rules
Reading rules file: /usr/lib/udev/rules.d/11-dm-mpath.rules
Reading rules file: /usr/lib/udev/rules.d/11-dm-parts.rules
Reading rules file: /usr/lib/udev/rules.d/13-dm-disk.rules
Reading rules file: /usr/lib/udev/rules.d/40-elevator.rules
Reading rules file: /usr/lib/udev/rules.d/40-redhat.rules
Reading rules file: /usr/lib/udev/rules.d/40-usb-blacklist.rules
Reading rules file: /usr/lib/udev/rules.d/50-udev-default.rules
Reading rules file: /usr/lib/udev/rules.d/60-alias-kmsg.rules
Reading rules file: /usr/lib/udev/rules.d/60-block.rules
Reading rules file: /usr/lib/udev/rules.d/60-cdrom_id.rules
Reading rules file: /usr/lib/udev/rules.d/60-drm.rules
Reading rules file: /usr/lib/udev/rules.d/60-evdev.rules
Reading rules file: /usr/lib/udev/rules.d/60-fido-id.rules
Reading rules file: /usr/lib/udev/rules.d/60-input-id.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-alsa.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-input.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-storage-tape.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-storage.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-v4l.rules
Reading rules file: /usr/lib/udev/rules.d/60-raw.rules
Reading rules file: /usr/lib/udev/rules.d/60-rdma-ndd.rules
Reading rules file: /usr/lib/udev/rules.d/60-rdma-persistent-naming.rules
Reading rules file: /usr/lib/udev/rules.d/60-sensor.rules
Reading rules file: /usr/lib/udev/rules.d/60-serial.rules
Reading rules file: /usr/lib/udev/rules.d/60-srp_daemon.rules
Reading rules file: /usr/lib/udev/rules.d/60-tpm-udev.rules
Reading rules file: /usr/lib/udev/rules.d/61-scsi-sg3_id.rules
Reading rules file: /usr/lib/udev/rules.d/62-multipath.rules
Reading rules file: /usr/lib/udev/rules.d/63-fc-wwpn-id.rules
Reading rules file: /usr/lib/udev/rules.d/63-md-raid-arrays.rules
Reading rules file: /usr/lib/udev/rules.d/63-scsi-sg3_symlink.rules
Reading rules file: /usr/lib/udev/rules.d/64-btrfs.rules
Reading rules file: /usr/lib/udev/rules.d/64-md-raid-assembly.rules
Reading rules file: /usr/lib/udev/rules.d/65-gce-disk-naming.rules
Reading rules file: /usr/lib/udev/rules.d/65-md-incremental.rules
Reading rules file: /usr/lib/udev/rules.d/66-azure-storage.rules
Reading rules file: /usr/lib/udev/rules.d/66-kpartx.rules
Reading rules file: /usr/lib/udev/rules.d/68-azure-sriov-nm-unmanaged.rules
Reading rules file: /usr/lib/udev/rules.d/68-del-part-nodes.rules
Reading rules file: /usr/lib/udev/rules.d/69-dm-lvm-metad.rules
Reading rules file: /usr/lib/udev/rules.d/69-md-clustered-confirm-device.rules
Reading rules file: /usr/lib/udev/rules.d/70-joystick.rules
Reading rules file: /usr/lib/udev/rules.d/70-mouse.rules
Reading rules file: /etc/udev/rules.d/70-persistent-ipoib.rules
Reading rules file: /usr/lib/udev/rules.d/70-power-switch.rules
Reading rules file: /usr/lib/udev/rules.d/70-touchpad.rules
Reading rules file: /usr/lib/udev/rules.d/70-uaccess.rules
Reading rules file: /usr/lib/udev/rules.d/71-seat.rules
Reading rules file: /usr/lib/udev/rules.d/73-idrac.rules
Reading rules file: /usr/lib/udev/rules.d/73-seat-late.rules
Reading rules file: /usr/lib/udev/rules.d/75-net-description.rules
Reading rules file: /usr/lib/udev/rules.d/75-probe_mtd.rules
Reading rules file: /usr/lib/udev/rules.d/75-rdma-description.rules
Reading rules file: /usr/lib/udev/rules.d/78-sound-card.rules
Reading rules file: /usr/lib/udev/rules.d/80-drivers.rules
Reading rules file: /usr/lib/udev/rules.d/80-net-setup-link.rules
Reading rules file: /usr/lib/udev/rules.d/84-nm-drivers.rules
Reading rules file: /usr/lib/udev/rules.d/85-nm-unmanaged.rules
Reading rules file: /usr/lib/udev/rules.d/90-coreos-device-mapper.rules
Reading rules file: /usr/lib/udev/rules.d/90-iwpmd.rules
Reading rules file: /usr/lib/udev/rules.d/90-nm-thunderbolt.rules
Reading rules file: /usr/lib/udev/rules.d/90-rdma-hw-modules.rules
Reading rules file: /usr/lib/udev/rules.d/90-rdma-ulp-modules.rules
Reading rules file: /usr/lib/udev/rules.d/90-rdma-umad.rules
Reading rules file: /usr/lib/udev/rules.d/90-vconsole.rules
Reading rules file: /usr/lib/udev/rules.d/91-drm-modeset.rules
Reading rules file: /usr/lib/udev/rules.d/91-vfio.rules
Reading rules file: /usr/lib/udev/rules.d/95-dm-notify.rules
Reading rules file: /usr/lib/udev/rules.d/98-rdma.rules
Reading rules file: /usr/lib/udev/rules.d/99-azure-product-uuid.rules
Reading rules file: /usr/lib/udev/rules.d/99-systemd.rules
Reading rules file: /usr/lib/udev/rules.d/99-vmware-scsi-udev.rules
rules contain 49152 bytes tokens (4096 * 12 bytes), 23140 bytes strings
3345 strings (44695 bytes), 2318 de-duplicated (22583 bytes), 1028 trie nodes used
IMPORT builtin 'net_id' /usr/lib/udev/rules.d/75-net-description.rules:6
IMPORT builtin 'hwdb' /usr/lib/udev/rules.d/75-net-description.rules:12
IMPORT builtin 'path_id' /usr/lib/udev/rules.d/80-net-setup-link.rules:5
IMPORT builtin 'net_setup_link' /usr/lib/udev/rules.d/80-net-setup-link.rules:9
Config file /usr/lib/systemd/network/99-default.link applies to device ens1f0v21
link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
NAME 'ens1f0v21' /usr/lib/udev/rules.d/80-net-setup-link.rules:11
RUN 'kmod load mlx5_ib' /usr/lib/udev/rules.d/90-rdma-hw-modules.rules:20
RUN '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/$name --prefix=/net/ipv4/neigh/$name --prefix=/net/ipv6/conf/$name --prefix=/net/ipv6/neigh/$name' /usr/lib/udev/rules.d/99-systemd.rules:60
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v21
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v21
ID_NET_NAME_SLOT=ens1f0v21
ID_PATH=pci-0000:3b:02.7
ID_PATH_TAG=pci-0000_3b_02_7
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=64
INTERFACE=ens1f0v21
NM_UNMANAGED=1
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v21
TAGS=:systemd:
USEC_INITIALIZED=1221371446
run: 'kmod load mlx5_ib'
run: '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/ens1f0v21 --prefix=/net/ipv4/neigh/ens1f0v21 --prefix=/net/ipv6/conf/ens1f0v21 --prefix=/net/ipv6/neigh/ens1f0v21'
Unload module index
Unloaded link configuration context.
[root@worker-0 ~]# udevadm info --path /sys/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
P: /devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
E: DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/ens1f0v21
E: ID_BUS=pci
E: ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
E: ID_MODEL_ID=0x1018
E: ID_NET_DRIVER=mlx5_core
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: ID_NET_NAME=ens1f0v21
E: ID_NET_NAME_MAC=enx<redacted mac>
E: ID_NET_NAME_PATH=enp59s0f0v21
E: ID_NET_NAME_SLOT=ens1f0v21
E: ID_PATH=pci-0000:3b:02.7
E: ID_PATH_TAG=pci-0000_3b_02_7
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies
E: ID_VENDOR_ID=0x15b3
E: IFINDEX=64
E: INTERFACE=ens1f0v21
E: NM_UNMANAGED=1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v21
E: TAGS=:systemd:
E: USEC_INITIALIZED=1221371446

[root@worker-0 ~]# ip a ls dev ens1f0v21
64: ens1f0v21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether <redacted mac> brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.174/24 brd 192.168.123.255 scope global dynamic noprefixroute ens1f0v21
       valid_lft 690795sec preferred_lft 690795sec
    inet6 ... scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@worker-0 ~]# nmcli conn
NAME              UUID                                  TYPE           DEVICE    
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens1f0v21 
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens1f1    
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens7f1    
ovs-if-br-ex      5090ed3a-6dca-44b6-a31c-6c6539132c23  ovs-interface  br-ex     
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       eno2np1   
Wired Connection  5f123cb7-3abb-477a-8c29-55fa809f19b4  ethernet       ens7f0    
br-ex             45858fce-90a1-41d6-ae89-3437bac40a76  ovs-bridge     br-ex     
ovs-if-phys0      7fd65994-e486-4f47-a18f-0f5e6e02c235  ethernet       ens1f0    
ovs-port-br-ex    214c5ade-fed8-4b3c-84f7-86e11397d6bb  ovs-port       br-ex     
ovs-port-phys0    9ec1d966-97a1-4334-a3ed-0400ec39ef1a  ovs-port       ens1f0    
[root@worker-0 ~]# 
[root@worker-0 ~]# 
[root@worker-0 ~]# 
[root@worker-0 ~]# udevadm info --path /devices/pci0000:3a/0000:3a:00.0/0000:3b:02.7/net/net1
syspath not found
~~~



=======================

[0]

[akaris@linux 02786983]$ cat udevseq1.txt 
UDEV  [417.691883] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v23
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=34212
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=417686148

UDEV  [417.692216] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=net1
SEQNUM=34213
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=417692113

UDEV  [417.693715] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
SEQNUM=34217
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=417692113

Wired Connection: connection profile changed
[akaris@linux 02786983]$ cat udevseq2.txt 
UDEV  [354.543365] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f0v23
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=33477
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=354538388

UDEV  [354.543844] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=net1
SEQNUM=33478
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=354543742

UDEV  [354.545042] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx<redacted mac>
ID_NET_NAME_PATH=enp59s0f0v23
ID_NET_NAME_SLOT=ens1f0v23
ID_PATH=pci-0000:3b:03.1
ID_PATH_TAG=pci-0000_3b_03_1
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=71
INTERFACE=ens1f0v23
SEQNUM=33482
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f0v23
TAGS=:systemd:
USEC_INITIALIZED=354543742
[akaris@linux 02786983]$ diff udevseq1.txt udevseq2.txt 
1c1
< UDEV  [417.691883] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
---
> UDEV  [354.543365] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/ens1f0v23 (net)
23c23
< SEQNUM=34212
---
> SEQNUM=33477
27c27
< USEC_INITIALIZED=417686148
---
> USEC_INITIALIZED=354538388
29c29
< UDEV  [417.692216] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
---
> UDEV  [354.543844] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
44c44
< SEQNUM=34213
---
> SEQNUM=33478
48c48
< USEC_INITIALIZED=417692113
---
> USEC_INITIALIZED=354543742
50c50
< UDEV  [417.693715] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
---
> UDEV  [354.545042] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:03.1/net/net1 (net)
68c68
< SEQNUM=34217
---
> SEQNUM=33482
72,74c72
< USEC_INITIALIZED=417692113
< 
< Wired Connection: connection profile changed
---
> USEC_INITIALIZED=354543742

====================

How to work around the event with the SR-IOV network operator and a udev rule tweak:

* match on `ACTION=="add|change|move"`:

~~~
[root@worker-0 ~]# cat /etc/udev/rules.d/99-test.rules 
ACTION=="add|change|move", ATTRS{device}=="0x154c|0x1016|0x1018|0x101c|0x1014", ENV{NM_UNMANAGED}="1"
~~~

This will add NM_UNMANAGED=1 to the udev "move" events, too:
~~~
UDEV  [14134.602264] add      /devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/ens1f1v13 (net)
ACTION=add
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/ens1f1v13
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_DRIVER=mlx5_core
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME=ens1f1v13
ID_NET_NAME_MAC=enx36bb05ba0438
ID_NET_NAME_PATH=enp59s0f1v13
ID_NET_NAME_SLOT=ens1f1v13
ID_PATH=pci-0000:3b:05.5
ID_PATH_TAG=pci-0000_3b_05_5
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=114
INTERFACE=ens1f1v13
INTERFACE_OLD=net1
NM_UNMANAGED=1
SEQNUM=190826
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens1f1v13
TAGS=:systemd:
USEC_INITIALIZED=14134596165

UDEV  [14134.602702] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_PATH=pci-0000:3b:05.5
ID_PATH_TAG=pci-0000_3b_05_5
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=114
INTERFACE=net1
NM_UNMANAGED=1
SEQNUM=190827
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1
TAGS=:systemd:
USEC_INITIALIZED=14134602583

UDEV  [14134.604382] move     /devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1 (net)
ACTION=move
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1
DEVPATH_OLD=/devices/pci0000:3a/0000:3a:00.0/0000:3b:05.5/net/net1
ID_BUS=pci
ID_MODEL_FROM_DATABASE=MT27800 Family [ConnectX-5 Virtual Function]
ID_MODEL_ID=0x1018
ID_NET_NAME_MAC=enx36bb05ba0438
ID_NET_NAME_PATH=enp59s0f1v13
ID_NET_NAME_SLOT=ens1f1v13
ID_PATH=pci-0000:3b:05.5
ID_PATH_TAG=pci-0000_3b_05_5
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=Mellanox Technologies
ID_VENDOR_ID=0x15b3
IFINDEX=114
INTERFACE=ens1f1v13
NM_UNMANAGED=1
SEQNUM=190831
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/net1 /sys/subsystem/net/devices/ens1f1v13
TAGS=:systemd:
USEC_INITIALIZED=14134602583
~~~

And in customer tests, the issue could not be reproduced.

Comment 20 Ken Holtz 2020-12-08 18:20:10 UTC
Do we have a target 4.6 Z stream for this?

Comment 24 zhaozhanqi 2020-12-14 06:18:00 UTC
Verified this on 4.7.0-202012120244.p0


sudo cat /etc/udev/rules.d/10-nm-unmanaged.rules 
ACTION=="add|change|move", ATTRS{device}=="0x1014|0x1016|0x1018|0x101c|0x154c", ENV{NM_UNMANAGED}="1"

Comment 27 errata-xmlrpc 2021-02-24 15:32:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.