Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2036515

Summary: when enable SR-IOV with 2 ports the mapping between VF netdevice and RDMA ulp device is not consistent
Product: Red Hat Enterprise Linux 8 Reporter: Moshe Levi <moshele>
Component: NetworkManagerAssignee: Lubomir Rintel <lrintel>
Status: CLOSED NOTABUG QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.5CC: bgalvani, fge, lrintel, mleitner, rkhan, sfaye, sukulkar, till, vbenes
Target Milestone: rcKeywords: Triaged
Target Release: 8.7Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-11 17:19:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Moshe Levi 2022-01-02 18:33:53 UTC
Description of problem:
My setup Mellanox ConnectX-6 Dx dual port with SR-IOV enabled
NetworkManager-1.32.10-4.el8.x86_64
kernel-4.18.0-348.el8.x86_64
When I create 8 VF on both interfaces I will get the following mapping:
[root@cloud-dev-15 ~]# cat /etc/redhat-release 
CentOS Linux release 8.5.2111

[root@cloud-dev-15 NetworkManager]# ls -l /sys/class/net/ens3f*v*/device/infiniband_verbs/

/sys/class/net/ens3f0v0/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs5

/sys/class/net/ens3f0v1/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs7

/sys/class/net/ens3f0v2/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs9

/sys/class/net/ens3f0v3/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs11

/sys/class/net/ens3f0v4/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs13

/sys/class/net/ens3f0v5/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs15

/sys/class/net/ens3f0v6/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs17

/sys/class/net/ens3f0v7/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs19

/sys/class/net/ens3f1v0/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs4

/sys/class/net/ens3f1v1/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs6

/sys/class/net/ens3f1v2/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs8

/sys/class/net/ens3f1v3/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs10

/sys/class/net/ens3f1v4/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs12

/sys/class/net/ens3f1v5/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs14

/sys/class/net/ens3f1v6/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs16

/sys/class/net/ens3f1v7/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 13:31 uverbs18


as you can see VF ens3f0v0 mapped to uverbs5 

after reboot I get the following mapping
[root@cloud-dev-15 ~]# ls -l /sys/class/net/ens3f*v*/device/infiniband_verbs/
/sys/class/net/ens3f0v0/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs4

/sys/class/net/ens3f0v1/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs6

/sys/class/net/ens3f0v2/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs8

/sys/class/net/ens3f0v3/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs10

/sys/class/net/ens3f0v4/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs12

/sys/class/net/ens3f0v5/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs14

/sys/class/net/ens3f0v6/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs16

/sys/class/net/ens3f0v7/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs18

/sys/class/net/ens3f1v0/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs5

/sys/class/net/ens3f1v1/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs7

/sys/class/net/ens3f1v2/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs9

/sys/class/net/ens3f1v3/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs11

/sys/class/net/ens3f1v4/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs13

/sys/class/net/ens3f1v5/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs15

/sys/class/net/ens3f1v6/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs17

/sys/class/net/ens3f1v7/device/infiniband_verbs/:
total 0
drwxr-xr-x 3 root root 0 Jan  2 17:40 uverbs19

as you can see VF ens3f0v0 mapped to uverbs4 

The mapping change between reboot. This is cause issue in k8s cluster when kubelet except that the mapping will be consist  in reboot.

The reason for this inconsistent is that NetworkManager will try to create SR-IOV in async way https://github.com/NetworkManager/NetworkManager/blob/6a68008e44bf2e9b6bc464825ec61bd062120f4a/src/libnm-platform/nm-platform.c#L1807. This will trigger create SR-IOV on both PF at the same time. The ULP device (RDMA upper layer protocols) will be created per VF netdevice. The ULP device names will be created with first available name. When you enable SR-IOV in parallel the ULP devices name will be in consistent. It the NetworkManager will create the SR-IOV VF in same order (meaning first PF0 and then PF1) and the command will be blocked we will always get consistent mapping between VF netdevice and ULP device under (/dev/infiniband)


Version-Release number of selected component (if applicable):
NetworkManager-1.32.10-4.el8.x86_64


How reproducible:
On CentOS 8.5 service


Steps to Reproduce:
1. run the following commands:
nmcli con add type ethernet con-name ens3f0 ifname ens3f0
nmcli con modify ens3f0 sriov.total-vfs 8 ipv4.method disabled ipv6.method disabled
nmcli con add type ethernet con-name ens3f1 ifname ens3f1
nmcli con modify ens3f1 sriov.total-vfs 8 ipv4.method disabled ipv6.method disabled

2. check VF netdevice to verbs mapping with
/sys/class/net/ens3f*v*/device/infiniband_verbs/
3. reboot the service

4. check VF netdevice to verbs mapping with 
/sys/class/net/ens3f*v*/device/infiniband_verbs/


Actual results:
some times you will get different mapping on VF netdevice to uverbsX


Expected results:
we want that the mapping will be consistence on every reboot. 

Additional info:

Comment 3 Lubomir Rintel 2022-10-13 12:05:02 UTC
Thanks for the report.

This is no different from any other device naming scheme that uses sequential numbering. The way in which the devices are probed, driver bound and kernel devices assigned is asynchronous by its very nature. A hack in NetworkManager may work around this particular issue, if possible at all, would just sweep the issue under the rug.

Some sort of a stable device naming scheme for infiniband verbs device is needed instead.

Would something like this work for you? https://github.com/systemd/systemd/pull/24988

What you'd get is (this is just the physical functions on ConnectX-5 Ex, but it illustrates the point):

  [root@wsfd-netdev93 ~]# find /dev/infiniband/ |xargs ls -ld
  drwxr-xr-x. 4 root root      120 Oct 13 07:42 /dev/infiniband/
  drwxr-xr-x. 2 root root       80 Oct 13 07:42 /dev/infiniband/by-ibdev
  lrwxrwxrwx. 1 root root       10 Oct 13 07:42 /dev/infiniband/by-ibdev/uverbs-mlx5_0 -> ../uverbs0
  lrwxrwxrwx. 1 root root       10 Oct 13 07:42 /dev/infiniband/by-ibdev/uverbs-mlx5_1 -> ../uverbs1
  drwxr-xr-x. 2 root root       80 Oct 13 07:42 /dev/infiniband/by-path
  lrwxrwxrwx. 1 root root       10 Oct 13 07:42 /dev/infiniband/by-path/pci-0000:65:00.0 -> ../uverbs0
  lrwxrwxrwx. 1 root root       10 Oct 13 07:42 /dev/infiniband/by-path/pci-0000:65:00.1 -> ../uverbs1
  crw-rw-rw-. 1 root root 231, 192 Oct 13 07:42 /dev/infiniband/uverbs0
  crw-rw-rw-. 1 root root 231, 193 Oct 13 07:42 /dev/infiniband/uverbs1
  [root@wsfd-netdev93 ~]#

Comment 4 Moshe Levi 2022-10-13 20:21:28 UTC
so you just create symbolic link to the char device. 
The issue is in k8s environment where we need to mount uverbsX char device to it. How will symbolic link will help is such case? 

So network manager is the first tool that I am aware of that parallel SR-IOV VF creation. This was introduce by this commit [1]. I would expect to have a flag to allow also serial creation of SR-IOV VF.
by the way the issue come from one of our customers and the solution was to downgrade NetworkManager.



[1] - https://github.com/NetworkManager/NetworkManager/commit/121c58f0c48de9fb64a87ef02e3e090d90d2e96e

Comment 9 Lubomir Rintel 2023-01-11 17:19:30 UTC
> so you just create symbolic link to the char device.
> The issue is in k8s environment where we need to mount uverbsX char device to it. How will symbolic link will help is such case?

It will enable you to look up the correct uverbs device using stable means (i.e. the bus location of the hardware device or the IB device name) and not use the numbering which is entirely non-deterministic.

> So network manager is the first tool that I am aware of that parallel SR-IOV VF creation.

We don't do the VF creation, we merely ask kernel how many devices we need. The decision about the order or the naming of the devices is entirely up to the kernel.

> This was introduce by this commit [1]. I would expect to have a flag to allow also serial creation of SR-IOV VF.

Adding an explicit serialization in VF creation would be at expense of complexity and a performance penalty. And even if it was done, it's still not guarranteed that the creation would be done in any particular order: the uverbs name ranges would no longer overlap, but still wouldn't be guarranteed to be stable. They never have been. The connection activations are entirely independent, always have been, and can start and proceed in any order.

I'm closing this, because as far as I'm able to tell things are behaving as designed. This is not a NetworkManager bug.

You need to stop relying on what is effectively an implementation detail. Please let me know if you need help with that. Filing a systemd bug asking to backport the above patch might be a good start.