Bug 2050987 - [RHEL8.6] - fabtests on BXNT ROCE device produce lots of core files
Summary: [RHEL8.6] - fabtests on BXNT ROCE device produce lots of core files
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: rdma-core
Version: 8.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Kamal Heib
QA Contact: Infiniband QE
URL:
Whiteboard:
Depends On:
Blocks: 2093719
TreeView+ depends on / blocked
 
Reported: 2022-02-05 14:40 UTC by Brian Chae
Modified: 2023-08-05 07:28 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2093719 (view as bug list)
Environment:
Last Closed: 2023-08-05 07:28:19 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-111231 0 None None None 2022-02-05 14:41:00 UTC

Description Brian Chae 2022-02-05 14:40:22 UTC
Description of problem:

This may be related to Bug #1971174 & 2014054 but in this case, the core files out of fabtests are observed in all BXNT ROCE devices, including BCM57414.


Version-Release number of selected component (if applicable):


Clients: rdma-qe-25
Servers: rdma-qe-24

DISTRO=RHEL-8.6.0-20220131.1

+ [22-02-01 09:29:32] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 Beta (Ootpa)

+ [22-02-01 09:29:32] uname -a
Linux rdma-qe-24.rdma.lab.eng.rdu2.redhat.com 4.18.0-361.el8.x86_64 #1 SMP Mon Jan 24 10:45:51 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

+ [22-02-01 09:29:32] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-361.el8.x86_64 root=UUID=cb9a3f0d-b595-409b-a8f3-47752a9f1e96 ro crashkernel=auto resume=UUID=f61be3cc-7371-41ce-832f-274b8cbae8b3 console=ttyS0,115200n81

+ [22-02-01 09:29:32] rpm -q rdma-core linux-firmware
rdma-core-37.2-1.el8.x86_64
linux-firmware-20211119-105.gitf5d51956.el8.noarch
+ [22-02-01 09:29:32] tail /sys/class/infiniband/bnxt_re0/fw_ver /sys/class/infiniband/bnxt_re1/fw_ver /sys/class/infiniband/bnxt_re2/fw_ver /sys/class/infiniband/bnxt_re3/fw_ver
==> /sys/class/infiniband/bnxt_re0/fw_ver <==
20.8.30.0

==> /sys/class/infiniband/bnxt_re1/fw_ver <==
20.8.30.0

==> /sys/class/infiniband/bnxt_re2/fw_ver <==
216.0.51.0

==> /sys/class/infiniband/bnxt_re3/fw_ver <==
216.0.51.0

+ [22-02-01 09:29:32] lspci
+ [22-02-01 09:29:32] grep -i -e ethernet -e infiniband -e omni -e ConnectX
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
1a:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
1a:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
5e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
How reproducible:

100%

Steps to Reproduce:
1. With the above build, run the following fabtests command
2. On the server run the fabtests, first

/usr/bin/runfabtests.sh -T 60 -vvv -t quick psm3 172.31.45.125 172.31.45.126 | tee -a fabtests_psm3_quick.log


3. On the client run the fabtests, afterwards

/usr/bin/runfabtests.sh -T 60 -vvv -t quick psm3 172.31.45.125 172.31.45.126 | tee -a fabtests_psm3_quick.log


Actual results:

After "journal -a", the following messages show on both hosts:


[ 1021.052855] qperf[116003]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757f3d18 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1021.065024] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1023.378131] qperf[116011]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757f3d28 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1023.390300] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1025.695144] qperf[116024]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffcb8 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1025.707335] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1028.016585] qperf[116033]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffd48 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1028.028774] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1030.340159] qperf[116041]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffd48 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1030.352355] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1032.662261] qperf[116049]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757f3d28 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1032.674449] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1034.981671] qperf[116057]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffcb8 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1034.993840] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1037.305999] qperf[116065]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffcb8 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1037.318172] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00


Expected results:

Fabtests run without any such core files

Additional info:

Comment 3 RHEL Program Management 2023-08-05 07:28:19 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.