Bug 2036515
| Summary: | when enable SR-IOV with 2 ports the mapping between VF netdevice and RDMA ulp device is not consistent | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Moshe Levi <moshele> |
| Component: | NetworkManager | Assignee: | Lubomir Rintel <lrintel> |
| Status: | CLOSED NOTABUG | QA Contact: | Desktop QE <desktop-qa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.5 | CC: | bgalvani, fge, lrintel, mleitner, rkhan, sfaye, sukulkar, till, vbenes |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | 8.7 | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-11 17:19:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Moshe Levi
2022-01-02 18:33:53 UTC
Thanks for the report. This is no different from any other device naming scheme that uses sequential numbering. The way in which the devices are probed, driver bound and kernel devices assigned is asynchronous by its very nature. A hack in NetworkManager may work around this particular issue, if possible at all, would just sweep the issue under the rug. Some sort of a stable device naming scheme for infiniband verbs device is needed instead. Would something like this work for you? https://github.com/systemd/systemd/pull/24988 What you'd get is (this is just the physical functions on ConnectX-5 Ex, but it illustrates the point): [root@wsfd-netdev93 ~]# find /dev/infiniband/ |xargs ls -ld drwxr-xr-x. 4 root root 120 Oct 13 07:42 /dev/infiniband/ drwxr-xr-x. 2 root root 80 Oct 13 07:42 /dev/infiniband/by-ibdev lrwxrwxrwx. 1 root root 10 Oct 13 07:42 /dev/infiniband/by-ibdev/uverbs-mlx5_0 -> ../uverbs0 lrwxrwxrwx. 1 root root 10 Oct 13 07:42 /dev/infiniband/by-ibdev/uverbs-mlx5_1 -> ../uverbs1 drwxr-xr-x. 2 root root 80 Oct 13 07:42 /dev/infiniband/by-path lrwxrwxrwx. 1 root root 10 Oct 13 07:42 /dev/infiniband/by-path/pci-0000:65:00.0 -> ../uverbs0 lrwxrwxrwx. 1 root root 10 Oct 13 07:42 /dev/infiniband/by-path/pci-0000:65:00.1 -> ../uverbs1 crw-rw-rw-. 1 root root 231, 192 Oct 13 07:42 /dev/infiniband/uverbs0 crw-rw-rw-. 1 root root 231, 193 Oct 13 07:42 /dev/infiniband/uverbs1 [root@wsfd-netdev93 ~]# so you just create symbolic link to the char device. The issue is in k8s environment where we need to mount uverbsX char device to it. How will symbolic link will help is such case? So network manager is the first tool that I am aware of that parallel SR-IOV VF creation. This was introduce by this commit [1]. I would expect to have a flag to allow also serial creation of SR-IOV VF. by the way the issue come from one of our customers and the solution was to downgrade NetworkManager. [1] - https://github.com/NetworkManager/NetworkManager/commit/121c58f0c48de9fb64a87ef02e3e090d90d2e96e > so you just create symbolic link to the char device. > The issue is in k8s environment where we need to mount uverbsX char device to it. How will symbolic link will help is such case? It will enable you to look up the correct uverbs device using stable means (i.e. the bus location of the hardware device or the IB device name) and not use the numbering which is entirely non-deterministic. > So network manager is the first tool that I am aware of that parallel SR-IOV VF creation. We don't do the VF creation, we merely ask kernel how many devices we need. The decision about the order or the naming of the devices is entirely up to the kernel. > This was introduce by this commit [1]. I would expect to have a flag to allow also serial creation of SR-IOV VF. Adding an explicit serialization in VF creation would be at expense of complexity and a performance penalty. And even if it was done, it's still not guarranteed that the creation would be done in any particular order: the uverbs name ranges would no longer overlap, but still wouldn't be guarranteed to be stable. They never have been. The connection activations are entirely independent, always have been, and can start and proceed in any order. I'm closing this, because as far as I'm able to tell things are behaving as designed. This is not a NetworkManager bug. You need to stop relying on what is effectively an implementation detail. Please let me know if you need help with that. Filing a systemd bug asking to backport the above patch might be a good start. |