RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2136937 - Port names changed for RHEL 9.1 Infiniband set up?
Summary: Port names changed for RHEL 9.1 Infiniband set up?
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: Documentation
Version: 9.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Mayur Patil
QA Contact:
Gabi Fialová
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-21 21:39 UTC by Jon Trossbach
Modified: 2023-09-12 12:02 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
.Updated `systemd-udevd` assigns consistent network device names to InfiniBand interfaces Introduced in RHEL 9, the new version of the `systemd` package contains the updated `systemd-udevd` device manager. The device manager changes the default names of InfiniBand interfaces to consistent names selected by `systemd-udevd`. You can define custom naming rules for naming InfiniBand interfaces by following the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_infiniband_and_rdma_networks/configuring-the-core-rdma-subsystem_configuring-infiniband-and-rdma-networks#renaming-ipoib-devices-using-systemd-link-file_configuring-the-core-rdma-subsystem[Renaming IPoIB devices using systemd link file] procedure. For more details of the naming scheme, see the `systemd.net-naming-scheme(7)` man page.
Clone Of:
Environment:
Last Closed: 2023-08-30 14:02:11 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-137327 0 None None None 2022-10-21 21:43:19 UTC

Description Jon Trossbach 2022-10-21 21:39:30 UTC
Description of problem:
RHEL Ansible Upstream Testuite is failing for Infiniband configuration.

Version-Release number of selected component (if applicable):
Ansible 2.13

How reproducible:
Always

Steps to Reproduce:
# yum install iproute libibverbs libibverbs-utils infiniband-diags

# yum groupinstall "infiniband support"
# rmmod ib_ipoib
# modprobe ib_ipoib
# rhts-reboot
# ip a

Actual results:
8: ibs2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
    link/infiniband 00:00:02:d5:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:28:88 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp23s0f0
9: ibs2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
    link/infiniband 00:00:04:22:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:28:89 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp23s0f1


Expected results:
A port name for Infiniband with 'ib0'

Additional info:
We need to identify if this is just the Systemd team crossing Cent OS Stream with RHEL 9 stream again or if we really need to change the test to accommodate RHEL 9.1's apparent new portnaming scheme.

Comment 1 Jon Trossbach 2022-10-25 00:15:19 UTC
As is expected this is happening on 9.2 as well.

Comment 2 Jon Trossbach 2022-10-25 18:27:51 UTC
Here is the bug from last time Systemd team crossed CentOS Strem with RHEL 9: https://bugzilla.redhat.com/show_bug.cgi?id=2094284

Comment 3 Till Maas 2022-10-27 15:08:50 UTC
device naming is handled by systemd, reassigning.

Comment 4 Jon Trossbach 2022-11-02 17:37:43 UTC
Okay, after running more tests and reviewing multiple documentation sources, here is the summary of what I know:

Neither images from June nor images being put out right now are being set up with ib0 for RHEL9 like they are in RHEL 8. All RHEL 9 images I've checked seem consistent.

The RHEL 9 documentation seems incomplete here right now [1]. Especially when compared to RHEL 8 [2].

So the question remains. Given RHEL 9 portnaming is consistent, is it what was intended?

If it is as intended. We (Wen and I) need to consider updating our test procedure.

[1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/9/html/optimizing_rhel_9_for_real_time_for_low_latency_operation/con_infiniband-in-rhel-for-rt
[2] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_infiniband_and_rdma_networks/index

Comment 5 Jon Trossbach 2022-11-02 17:46:45 UTC
Worth pointing out: adding to my confusion from the RHEL 9 documentation is "Support for Infiniband under RHEL for Real Time is the same as that offered under RHEL 8."

Comment 6 Marc Muehlfeld 2022-11-17 08:52:58 UTC
(In reply to Jon Trossbach from comment #4)
> The RHEL 9 documentation seems incomplete here right now [1]. Especially
> when compared to RHEL 8 [2].
> 
> [1]
> https://access.redhat.com/documentation/en-us/
> red_hat_enterprise_linux_for_real_time/9/html/
> optimizing_rhel_9_for_real_time_for_low_latency_operation/con_infiniband-in-
> rhel-for-rt
> [2]
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/
> html-single/configuring_infiniband_and_rdma_networks/index

You are comparing two different guides. The "Configuring InfiniBand and RDMA networks" title exists for both:
* RHEL 9: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_infiniband_and_rdma_networks/index
* RHEL 8: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_infiniband_and_rdma_networks/index
In case that it's relevant for this ticket: Section 6.4 is the one about configuring IPoIB using SystemRoles/Ansible (the ones we had recently documented in https://issues.redhat.com/browse/RHELPLAN-130980).

Theoretically, the RHEL 9 guide should be correct because all guides were reviewed and tested by the SSTs before we published them for RHEL 9.0.

If the device names in the docs no longer match the naming in RHEL, please ping Mayur Patil (CC'ed to this ticket) with the details. He is the maintainer of the InfiniBand docs.


The guide you've linked in [1] is for "RHEL for Real Time", not RHEL. This section should better link the official RHEL "Configuring InfiniBand and RDMA networks" title instead of an HTML page written by a devel for RHEL 5/6. I created https://issues.redhat.com/browse/RHELPLAN-139773, and requested to fix this.


One more note: In the "Configuring and managing networking" guide we have a chapter about consistent device naming:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_and_managing_networking/consistent-network-interface-device-naming_configuring-and-managing-networking
(same content as in RHEL 8). It contains no details on InfiniBand devices (only RoCE is mentioned in the System z section). It's planed to review and rework the entire chapter (see https://issues.redhat.com/browse/RHELPLAN-137138).

Comment 7 Jon Trossbach 2023-02-17 19:14:37 UTC
Okay just getting the chance to revisit this now.

So given these are the docs for RHEL 9 we do have a problem. It could be docs or it could be systemd. Which is the problem is a conversation someone on the systemd team should to help decide. If the wrong default portnames are coming down stream by accident, I wouldn't be surprised as this tends to happen a lot with my more specialized XDP hardware set ups. It's not a big deal if they are it is just what seems most likely to me. And in that case we should not change the docs but change the default portname back to ib0 like we documented it should be.

On the other hand, if folks want infiniband portnames to default to ibs3f0 instead of ib0 we, in that case, should change the docs for RHEL 9.

That is to say this [0] does not document exactly what is happening. That is ib0 is no longer being created as the default infiniband port like the docs say it should be. It is being created by the portname of ibs3f0 instead when the following configuration set up is run.

This is a relatively minor problem and likely why we haven't had someone raise this issue with us yet as users are expected to change the portname themselves upon set up. But it still is sloppy and needs fixing.

# mstconfig -d 4b:00.1 set LINK_TYPE_P1=1 LINK_TYPE_P2=1

The card I am doing this specific test on is Mellanox Technologies MT28800 Family [ConnectX-5 Ex] or more commonly known as mlx5-cx5. To my knowledge all infiniband set ups use this driver.

[0] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_infiniband_and_rdma_networks/configuring-the-core-rdma-subsystem_configuring-infiniband-and-rdma-networks

Comment 8 Jon Trossbach 2023-02-20 11:31:37 UTC
Also worth noting is RHEL 9.1 from comment #0 had the infiniband portname of:
ibs2f0

While in comment #7 9.2 had the infiniband portname of:
ibs3f0

So even if we were to change the documentation from saying the portname should be:
ib0

The shifting portname between versions means we don't know which portname to document. Further pointing to a systemd problem not a docs problem. Though if RHEL 9 is going to stick with a new infiniband portname scheme, it needs to be documented.

Comment 9 Michal Sekletar 2023-03-18 09:33:29 UTC
(In reply to Jon Trossbach from comment #8)
> Also worth noting is RHEL 9.1 from comment #0 had the infiniband portname of:
> ibs2f0
> 
> While in comment #7 9.2 had the infiniband portname of:
> ibs3f0
> 
> So even if we were to change the documentation from saying the portname
> should be:
> ib0
> 
> The shifting portname between versions means we don't know which portname to
> document. Further pointing to a systemd problem not a docs problem. Though
> if RHEL 9 is going to stick with a new infiniband portname scheme, it needs
> to be documented.

Predictable network naming for infiniband was introduced upstream couple years ago in (https://github.com/systemd/systemd/commit/938d30aa98df887797c9e05074a562ddacdcdf5e) and RHEL-9.0 was released with that support already included. I think we should change the documentation accordingly to make sure people customers are aware of that change between major releases.

Of course udev provided interface names shouldn't change between minor releases. First thing to do in such a case is to try to downgrade systemd and observe how it behaves with the new kernel+drivers. In case the naming algorithm still produces "wrong" name then it means that kernel changed values it exports about the device.

Comment 10 Jon Trossbach 2023-03-19 07:02:37 UTC
Not only do we have infiniband portnames documented
incorrectly we have them inconsistent amongst servers with the same RHEL 9 
versions installed. I've only seen it to be a problem with RHEL 9 but I'm 
told something similar also happened a lot around when there was a fresh 
major version release of RHEL 8.0.

As you can see in this bug it sends QEs like me and Docs writers on
months long wild-goose chases as we try and figure out if the problem
exists in systemd, kernel, the driver, or the way things are documented. All
the while, not having the proper thing -- default portnames or documentation -- get
fixed. 

I think we need to be extra sure this in not inconsistent portnaming because 
this is the kind of thing that can potentially knock a production server 
offline after an upgrade because the network configuration may no longer 
applied after an upgrade.

Something to know is that Michal Sekletar (systemd developer) said the
following to me and others in a network SST email chain last summer.

> Difference between RHEL-8 and RHEL-9 is actually substantial. Previously net_id
> didn’t name Infiniband interfaces at all. In RHEL-9 it actually names IB interfaces
> using predictable names. This has been introduced upstream in 2018 and we inherited
> it in RHEL-9.

So let me explain the behavior I am seeing and let me know if this is
the behavior you mean RHEL 9 inherited.

On wsfd-advnetlab153.anl.lab.eng.bos.redhat.com (Dell) the portnames
appear to be consistent at least all the way back to October for RHEL 9. That
is to say the ibs3f0 remains assigned on the custom IBM mlx5-cx5 NIC
card in RHEL 9 infiniband. Still that is inconsistent with the
documentation which says it defaults to ib0.

However, the other machine I often test on is
netqe16.lab3.eng.bos.redhat.com (Similar Dell model). 
That one has an altogether different infiniband portname on
RHEL9: ibp130s0f1.

Now the thing you can see in the below results is that BOTH 
wsfd-advnetlab153 and netqe16 have CONSISTENT portnames of ib0 on 
RHEL 8. But all the while the RHEL 9 default port names between 
wsfd-advnetlab153 and netqe16 are INCONSISTENT and do not match!
What gives? Is it the change you Michal said we inherited last summer?

I can only theorize that RHEL 9 portnaming might have a tighter
integration with portnaming schemes on the mlx5-cx5 NICs. Namely, if
you change the mlx5-cx5 portname on one RHEL 9 distribution does that
name change have persistence with other RHEL 9 distributions for the
same exact mlx5-cx5 NIC? Asked another way, do mlx5-cx5 have the
capacity to remember their assigned portnames across RHEL 9
distributions? Or are portnames possibly tied to specific PCIe 
interfaces in RHEL 9?

Those seem the only other things than a bug that would seem
to explain the behavior I am seeing.

And if that is the expected behavior in RHEL 9 it does not appear
documented in the infiniband documentation so like Michal said in 
the previous comment we should probably update this part of the
docuemtation to either explain of reference this change if it not 
a bug:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_infiniband_and_rdma_networks/index

Comment 11 Michal Sekletar 2023-03-22 09:42:17 UTC
(In reply to Jon Trossbach from comment #10)
> 
> Now the thing you can see in the below results is that BOTH 
> wsfd-advnetlab153 and netqe16 have CONSISTENT portnames of ib0 on 
> RHEL 8. But all the while the RHEL 9 default port names between 
> wsfd-advnetlab153 and netqe16 are INCONSISTENT and do not match!
> What gives? Is it the change you Michal said we inherited last summer?

Probably yes, portnames should be the same if machines are exactly the same.

> 
> I can only theorize that RHEL 9 portnaming might have a tighter
> integration with portnaming schemes on the mlx5-cx5 NICs. Namely, if
> you change the mlx5-cx5 portname on one RHEL 9 distribution does that
> name change have persistence with other RHEL 9 distributions for the
> same exact mlx5-cx5 NIC?

Names *must* be stable on single system across reboots, that is the only guarantee we give.

> Asked another way, do mlx5-cx5 have the
> capacity to remember their assigned portnames across RHEL 9
> distributions? Or are portnames possibly tied to specific PCIe 
> interfaces in RHEL 9?

Naming is stateless and naming algorithm *must* derive same name across reboots and across minor versions on a single system.

> And if that is the expected behavior in RHEL 9 it does not appear
> documented in the infiniband documentation so like Michal said in 
> the previous comment we should probably update this part of the
> docuemtation to either explain of reference this change if it not 

How naming works for both ethernet and infiniband is described in systemd.net-naming-scheme man page.

Comment 12 Michal Sekletar 2023-03-22 09:56:29 UTC
Moving to documentation because we need to document that udev's device naming is now also applied to Infiniband interfaces. Details on precisely how udev's device naming work are in "systemd.net-naming-scheme" man page.

Comment 13 Bob Fubel 2023-03-27 14:59:53 UTC
This issues is also present in 9.2 Beta.


Note You need to log in before you can comment on or make changes to this bug.