RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1937699 - rdma-ndd doesn't reliably initialize the node description of multiple Infiniband devices
Summary: rdma-ndd doesn't reliably initialize the node description of multiple Infinib...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: rdma-core
Version: 7.6
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: ---
Assignee: Honggang LI
QA Contact: zguo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-11 11:18 UTC by Georg Sauthoff
Modified: 2024-06-14 00:46 UTC (History)
10 users (show)

Fixed In Version: rdma-core-22.4-6.el7_9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-27 11:36:27 UTC
Target Upstream Version:
Embargoed:
zguo: needinfo-
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:1396 0 None None None 2021-04-27 11:36:30 UTC

Description Georg Sauthoff 2021-03-11 11:18:15 UTC
Description of problem:
On hosts with multiple Infiniband devices, only the Infiniband node description of the first device is reliably initialized. The node description of other devices often aren't initialized.

Version-Release number of selected component (if applicable):
rdma-core-17.2-3.el7.x86_64

How reproducible:
on some hosts 100%

Steps to Reproduce:
1. install RHEL 7.6 on a host with multiple Infiniband devices (e.g. a Host with a Mellanox ConnectX5 HCA which presents itself as 2 devices)
2. hostname -s
3. ls /sys/class/infiniband 
4. cat /sys/class/infiniband/*/node_desc

Actual results:
$ hostname -s
myshorthostname
$ ls /sys/class/infiniband
mlx5_0  mlx5_1
$ cat /sys/class/infiniband/*/node_desc
myshorthostname mlx5_0
MT4119 ConnectX5

Expected results:
$ hostname -s
myshorthostname
$ ls /sys/class/infiniband
mlx5_0  mlx5_1
$ cat /sys/class/infiniband/*/node_desc
myshorthostname mlx5_0
myshorthostname mlx5_1

Additional info:
We see this on several hosts with ConnextX5 cards in production.

This seems to be caused by a race condition between the start of rdma-ndd (via udev rule /usr/lib/udev/rules.d/60-rdma-ndd.rules, when the first device is showing up) and the rdma-ndd being initialized and ready to listen for new udev events. Thus, there is a time-window where the 2nd device shows up, rdma-ndd.service is already being started but hasn't established its udev monitoring. Or its udev monitoring is simply broken.

One workaround is to restart rdma-ndd.service after the rdma-hw.target is reached. Another is to write the simple node descriptions to /sys/.../node_desc by other means, after the rdma-hw.target is reached.

A proper fix I can think of is to let rdma-ndd re-iterate over all devices **after** it has established its udev monitoring.

Comment 2 Honggang LI 2021-03-11 13:58:38 UTC
Please feedback rdma-ndd log file with machine always reproduce this issue.

You can collect rdma-ndd log like this:

[root@rdma02 ~]# diff -Nurp /usr/lib/systemd/system/rdma-ndd.service.bak /usr/lib/systemd/system/rdma-ndd.service
--- /usr/lib/systemd/system/rdma-ndd.service.bak	2021-03-11 08:13:38.310702848 -0500
+++ /usr/lib/systemd/system/rdma-ndd.service	2021-03-11 08:38:07.018718538 -0500
@@ -19,6 +19,6 @@ Before=rdma-hw.target
 [Service]
 Type=notify
 Restart=always
-ExecStart=/usr/sbin/rdma-ndd --systemd
+ExecStart=/usr/sbin/rdma-ndd --debug --systemd
 
 # rdma-ndd is automatically wanted by udev when an RDMA device with a node description is present



[root@rdma02 ~]# systemctl daemon-reload
[root@rdma02 ~]# systemctl enable rdma-ndd.service
[root@rdma02 ~]# reboot

After system up again, collect log with

[root@rdma02 ~]# journalctl -u rdma-ndd

Comment 3 Georg Sauthoff 2021-03-11 14:12:33 UTC
Ok, I enabled rdma-ndd debug and rebooted the machine. This is the output now:

[root@myshorthostname ~]# cat /sys/class/infiniband/mlx5_*/node_desc
myshorthostname mlx5_0
MT4119 ConnectX5   Mellanox Technologies
[root@myshorthostname ~]# journalctl -u rdma-ndd
-- Logs begin at Thu 2021-03-11 15:06:57 CET, end at Thu 2021-03-11 15:08:30 CET. --
Mar 11 15:07:09 myshorthostname.example.de systemd[1]: Starting RDMA Node Description Daemon...
Mar 11 15:07:09 myshorthostname.example.de rdma-ndd[17411]: Node Descriptor format (%h %d)
Mar 11 15:07:09 myshorthostname.example.de rdma-ndd[17411]: mlx5_0: change (MT4119 ConnectX5   Mellanox Technologies) -> (myshorthostname mlx5_0)
Mar 11 15:07:10 myshorthostname.example.de systemd[1]: Started RDMA Node Description Daemon.

Comment 4 Honggang LI 2021-03-12 08:32:06 UTC
(In reply to Georg Sauthoff from comment #3)
> Ok, I enabled rdma-ndd debug and rebooted the machine. This is the output
> now:
> 
> [root@myshorthostname ~]# cat /sys/class/infiniband/mlx5_*/node_desc
> myshorthostname mlx5_0
> MT4119 ConnectX5   Mellanox Technologies
> [root@myshorthostname ~]# journalctl -u rdma-ndd
> -- Logs begin at Thu 2021-03-11 15:06:57 CET, end at Thu 2021-03-11 15:08:30
> CET. --
> Mar 11 15:07:09 myshorthostname.example.de systemd[1]: Starting RDMA Node
> Description Daemon...
> Mar 11 15:07:09 myshorthostname.example.de rdma-ndd[17411]: Node Descriptor
> format (%h %d)
> Mar 11 15:07:09 myshorthostname.example.de rdma-ndd[17411]: mlx5_0: change
> (MT4119 ConnectX5   Mellanox Technologies) -> (myshorthostname mlx5_0)
> Mar 11 15:07:10 myshorthostname.example.de systemd[1]: Started RDMA Node
> Description Daemon.

Please provide sos report of host myshorthostname.example.de . thanks

Comment 5 Honggang LI 2021-03-12 08:35:43 UTC
http://people.redhat.com/honli/.1937699/

Please test this scratch rpm. If issue persist, we need sos report too.

You need force erase the old rdma-core package and install the scratch like this:

    $  rpm -e --nodeps rdma-core
    $  rpm -ivh rdma-core-17.2-4.bz1937699.el7_6.x86_64.rpm
    $  cp /etc/rdma/rdma.conf.rpmsave /etc/rdma/rdma.conf
    $  cp /etc/udev/rules.d/70-persistent-ipoib.rules.rpmsave /etc/udev/rules.d/70-persistent-ipoib.rules
 
    update /usr/lib/systemd/system/rdma-ndd.service to enable debug log.
    
    $ systemctl daemon-reload 
    $ reboot

Comment 6 Georg Sauthoff 2021-03-16 15:00:05 UTC
I've tested your scratch RPM and with it the node description is now reliably set during boot.

That means after the reboot:

cat /sys/class/infiniband/mlx5_*/node_desc
myshorthostname mlx5_0
myshorthostname mlx5_1

journalctl -u rdma-ndd
-- Logs begin at Tue 2021-03-16 15:47:32 CET, end at Tue 2021-03-16 15:56:56 CET. --
Mar 16 15:47:43 myshorthostname systemd[1]: Starting RDMA Node Description Daemon...
Mar 16 15:47:43 myshorthostname rdma-ndd[16997]: Node Descriptor format (%h %d)
Mar 16 15:47:43 myshorthostname systemd[1]: Started RDMA Node Description Daemon.
Mar 16 15:47:43 myshorthostname rdma-ndd[16997]: mlx5_0: change (MT4119 ConnectX5   Mellanox Technologies) -> (myshorthostname mlx5_0)
Mar 16 15:47:43 myshorthostname rdma-ndd[16997]: mlx5_1: change (MT4119 ConnectX5   Mellanox Technologies) -> (myshorthostname mlx5_1)

systemctl status rdma-ndd
● rdma-ndd.service - RDMA Node Description Daemon
   Loaded: loaded (/usr/lib/systemd/system/rdma-ndd.service; static; vendor preset: disabled)
   Active: active (running) since Tue 2021-03-16 15:47:43 CET; 8min ago
     Docs: man:rdma-ndd
 Main PID: 16997 (rdma-ndd)
   CGroup: /system.slice/rdma-ndd.service
           └─16997 /usr/sbin/rdma-ndd --systemd --debug

Mar 16 15:47:43 myshorthostname systemd[1]: Starting RDMA Node Description Daemon...
Mar 16 15:47:43 myshorthostname rdma-ndd[16997]: Node Descriptor format (%h %d)
Mar 16 15:47:43 myshorthostname systemd[1]: Started RDMA Node Description Daemon.
Mar 16 15:47:43 myshorthostname rdma-ndd[16997]: mlx5_0: change (MT4119 ConnectX5   Mellanox Technologies) -> (myshorthostname mlx5_0)
Mar 16 15:47:43 myshorthostname rdma-ndd[16997]: mlx5_1: change (MT4119 ConnectX5   Mellanox Technologies) -> (myshorthostname mlx5_1)


So this fixes the issue.

Comment 7 Honggang LI 2021-03-16 15:11:39 UTC
(In reply to Georg Sauthoff from comment #6)
> I've tested your scratch RPM and with it the node description is now
> reliably set during boot.

Thanks for testing. I will submit a patch to upstream and backport it for RHEL once it merged into upstream.

Do you mind I add 'ReportedBy' and 'TestBy' tags for the patch?

Reported-by: Georg Sauthoff <georg.sauthoff>
Tested-by: Georg Sauthoff <georg.sauthoff>

Comment 8 Georg Sauthoff 2021-03-16 15:13:12 UTC
No, I don't mind.

Comment 9 Honggang LI 2021-03-17 01:53:46 UTC
I opened this PR to address this issue in upstream repo.

https://github.com/linux-rdma/rdma-core/pull/962

Comment 10 Honggang LI 2021-03-18 11:01:06 UTC
(In reply to Honggang LI from comment #9)
> I opened this PR to address this issue in upstream repo.
> 
> https://github.com/linux-rdma/rdma-core/pull/962

The PR had been merged into upstream repo. Set devel+ flag.

Comment 11 Honggang LI 2021-03-18 11:06:15 UTC
Hi, Georg

This bug was opened against RHEL-7.6. Are you looking for RHEL-7.6.z fix for it?

If yes, please provide business justification for RHEL-7.6.z request.

BTW, the bug MUST be fixed in RHEL-7.9.z first before we backport the fix for RHEL-7.7.z and RHEL-7.6.z.

thanks

Comment 12 Georg Sauthoff 2021-03-18 16:11:19 UTC
We don't need the fix for RHEL 7.6 since we upgrade to 7.9 in the not too far future and we can work around this issue until then.

So please let me know when the fix is expected to arrive in RHEL 7.9.

Comment 14 John W. Linville 2021-04-06 14:27:54 UTC
Georg Sauthoff, have you pursued this request through any of the more formal support channels for RHEL? Can you point us at those support requests, for reference?

Comment 15 Georg Sauthoff 2021-04-06 16:03:01 UTC
No, I haven't, so far.

I wasn't aware that I need to contact some other redhat support channel, as well.


Is there something I need to trigger?

Comment 16 Chris Williams 2021-04-07 12:50:34 UTC
Hi George,

Bugzilla is not a support tool. It's a development tool and has no SLAs.
Instead you need to open a Support case by logging into the Red Hat Customer Portal at https://access.redhat.com/
If you don't have a login you'll need to make arrangements with the organization that oversees your company's access to the RH Customer Portal.

Comment 17 Georg Sauthoff 2021-04-09 08:29:36 UTC
I see.

I've opened a case that references this bug: Case #02913732

Comment 30 errata-xmlrpc 2021-04-27 11:36:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rdma-core bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1396


Note You need to log in before you can comment on or make changes to this bug.