Bug 2050987

Summary: [RHEL8.6] - fabtests on BXNT ROCE device produce lots of core files
Product: Red Hat Enterprise Linux 8 Reporter: Brian Chae <bchae>
Component: rdma-coreAssignee: Kamal Heib <kheib>
Status: CLOSED WONTFIX QA Contact: Infiniband QE <infiniband-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.6CC: hwkernel-mgr, kheib, rdma-dev-team, selvin.xavier
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2093719 (view as bug list) Environment:
Last Closed: 2023-08-05 07:28:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2093719    

Description Brian Chae 2022-02-05 14:40:22 UTC
Description of problem:

This may be related to Bug #1971174 & 2014054 but in this case, the core files out of fabtests are observed in all BXNT ROCE devices, including BCM57414.


Version-Release number of selected component (if applicable):


Clients: rdma-qe-25
Servers: rdma-qe-24

DISTRO=RHEL-8.6.0-20220131.1

+ [22-02-01 09:29:32] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 Beta (Ootpa)

+ [22-02-01 09:29:32] uname -a
Linux rdma-qe-24.rdma.lab.eng.rdu2.redhat.com 4.18.0-361.el8.x86_64 #1 SMP Mon Jan 24 10:45:51 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

+ [22-02-01 09:29:32] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-361.el8.x86_64 root=UUID=cb9a3f0d-b595-409b-a8f3-47752a9f1e96 ro crashkernel=auto resume=UUID=f61be3cc-7371-41ce-832f-274b8cbae8b3 console=ttyS0,115200n81

+ [22-02-01 09:29:32] rpm -q rdma-core linux-firmware
rdma-core-37.2-1.el8.x86_64
linux-firmware-20211119-105.gitf5d51956.el8.noarch
+ [22-02-01 09:29:32] tail /sys/class/infiniband/bnxt_re0/fw_ver /sys/class/infiniband/bnxt_re1/fw_ver /sys/class/infiniband/bnxt_re2/fw_ver /sys/class/infiniband/bnxt_re3/fw_ver
==> /sys/class/infiniband/bnxt_re0/fw_ver <==
20.8.30.0

==> /sys/class/infiniband/bnxt_re1/fw_ver <==
20.8.30.0

==> /sys/class/infiniband/bnxt_re2/fw_ver <==
216.0.51.0

==> /sys/class/infiniband/bnxt_re3/fw_ver <==
216.0.51.0

+ [22-02-01 09:29:32] lspci
+ [22-02-01 09:29:32] grep -i -e ethernet -e infiniband -e omni -e ConnectX
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
1a:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
1a:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
5e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
How reproducible:

100%

Steps to Reproduce:
1. With the above build, run the following fabtests command
2. On the server run the fabtests, first

/usr/bin/runfabtests.sh -T 60 -vvv -t quick psm3 172.31.45.125 172.31.45.126 | tee -a fabtests_psm3_quick.log


3. On the client run the fabtests, afterwards

/usr/bin/runfabtests.sh -T 60 -vvv -t quick psm3 172.31.45.125 172.31.45.126 | tee -a fabtests_psm3_quick.log


Actual results:

After "journal -a", the following messages show on both hosts:


[ 1021.052855] qperf[116003]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757f3d18 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1021.065024] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1023.378131] qperf[116011]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757f3d28 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1023.390300] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1025.695144] qperf[116024]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffcb8 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1025.707335] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1028.016585] qperf[116033]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffd48 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1028.028774] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1030.340159] qperf[116041]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffd48 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1030.352355] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1032.662261] qperf[116049]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757f3d28 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1032.674449] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1034.981671] qperf[116057]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffcb8 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1034.993840] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00
[ 1037.305999] qperf[116065]: segfault at 0 ip 00007f0926fcc0b4 sp 00007ffe757ffcb8 error 4 in libibverbs.so.1.14.37.2[7f0926fb4000+1e000]
[ 1037.318172] Code: 01 00 00 85 c0 75 0c 83 e5 01 74 07 41 8b 14 24 89 53 38 5b 5d 41 5c c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa <48> 8b 07 48 8b 40 90 ff a0 28 01 00 00 66 66 2e 0f 1f 84 00 00 00


Expected results:

Fabtests run without any such core files

Additional info:

Comment 3 RHEL Program Management 2023-08-05 07:28:19 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.