Bug 1793585

Summary: [RHEL-7][Regression] ibacm[1221]: segfault at 0 ip 0000000000404f3e sp 00007ffe54e819a0 error 4 in ibacm[400000+e000]
Product: Red Hat Enterprise Linux 7 Reporter: zguo <zguo>
Component: rdma-coreAssignee: Honggang LI <honli>
Status: CLOSED ERRATA QA Contact: zguo <zguo>
Severity: high Docs Contact:
Priority: high    
Version: 7.8CC: honli, linville, mschmidt, rdma-dev-team, wchadwic
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rdma-core-22.4-2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1793736 1812444 (view as bug list) Environment:
Last Closed: 2020-09-29 19:25:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1793736, 1812444    

Description zguo 2020-01-21 15:46:38 UTC
Description of problem:

It works on ibacm-22.3-1.el7.x86_64, but failed on ibacm-22.4-1.el7.x86_64 on the same test bed(rdma-perf-00/01).
It might be caused by the recent change of rdma-core. Hit this segfault only on non-IB hardware. rdma-perf-01 are with both mlx4 IB and mlx4 RoCE device.

=============
[root@rdma-perf-01 ~]$ rpm -q ibacm rdma-core
ibacm-22.3-1.el7.x86_64
rdma-core-22.3-1.el7.x86_64
[root@rdma-perf-01 ~]$ systemctl restart ibacm
[root@rdma-perf-01 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-01-21 10:34:51 EST; 8s ago
     Docs: man:ibacm
           file:/etc/rdma/ibacm_opts.cfg
 Main PID: 850 (ibacm)
    Tasks: 4
   CGroup: /system.slice/ibacm.service
           └─850 /usr/sbin/ibacm --systemd

Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon...
Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon.

============

[root@rdma-perf-01 ~]$ rpm -q ibacm rdma-core
ibacm-22.4-1.el7.x86_64
rdma-core-22.4-1.el7.x86_64
[root@rdma-perf-01 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:ibacm
           file:/etc/rdma/ibacm_opts.cfg

Jan 21 10:29:55 rdma-perf-01.lab.bos.redhat.com systemd[1]: Unit ibacm.service entered failed state.
Jan 21 10:29:55 rdma-perf-01.lab.bos.redhat.com systemd[1]: ibacm.service failed.
Jan 21 10:33:48 rdma-perf-01.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon...
Jan 21 10:33:48 rdma-perf-01.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon.
Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Stopping InfiniBand Address Cache Manager Daemon...
Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Stopped InfiniBand Address Cache Manager Daemon.
Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon...
Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon.
Jan 21 10:35:47 rdma-perf-01.lab.bos.redhat.com systemd[1]: Stopping InfiniBand Address Cache Manager Daemon...
Jan 21 10:35:47 rdma-perf-01.lab.bos.redhat.com systemd[1]: Stopped InfiniBand Address Cache Manager Daemon.
[root@rdma-perf-01 ~]$ systemctl start ibacm
Job for ibacm.service failed because a fatal signal was delivered to the control process. See "systemctl status ibacm.service" and "journalctl -xe" for details.
[root@rdma-perf-01 ~]$ dmesg
[20013.737212] ibacm[1221]: segfault at 0 ip 0000000000404f3e sp 00007ffe54e819a0 error 4 in ibacm[400000+e000]
==========
Actual results:

ibacm[1221]: segfault at 0 ip 0000000000404f3e sp 00007ffe54e819a0 error 4 in ibacm[400000+e000] on machine having non-IB hardware.

Expected results:

No this segfault on non-IB hardware.

Additional info:

Comment 2 zguo 2020-01-21 16:07:36 UTC
It works well on cxgb4 with ibacm-22.4-1.el7.x86_64. So the scenario in bug description is the only case I hit this issue so far.

[root@rdma-dev-13 ~]$ rpm -q ibacm
ibacm-22.4-1.el7.x86_64
[root@rdma-dev-13 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2020-01-21 11:02:16 EST; 1min 32s ago
     Docs: man:ibacm
           file:/etc/rdma/ibacm_opts.cfg
  Process: 10487 ExecStart=/usr/sbin/ibacm --systemd (code=exited, status=255)
 Main PID: 10487 (code=exited, status=255)

Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon...
Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: ibacm.service: main process exited, code=exited, status=255/n/a
Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: Failed to start InfiniBand Address Cache Manager Daemon.
Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: Unit ibacm.service entered failed state.
Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: ibacm.service failed.

Comment 3 zguo 2020-01-22 02:34:45 UTC
No this issue on rdma-perf-03 with mlx5 IB and mlx5 RoCE.

[root@rdma-perf-03 ~]$ rpm -q  ibacm rdma-core
ibacm-22.4-1.el7.x86_64
rdma-core-22.4-1.el7.x86_64

[root@rdma-perf-03 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:ibacm
           file:/etc/rdma/ibacm_opts.cfg
[root@rdma-perf-03 ~]$ systemctl start ibacm
[root@rdma-perf-03 ~]$ dmesg
[root@rdma-perf-03 ~]$ dmesg
[root@rdma-perf-03 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-01-21 21:26:55 EST; 8s ago
     Docs: man:ibacm
           file:/etc/rdma/ibacm_opts.cfg
 Main PID: 11676 (ibacm)
   CGroup: /system.slice/ibacm.service
           └─11676 /usr/sbin/ibacm --systemd

Jan 21 21:26:55 rdma-perf-03.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon...
Jan 21 21:26:55 rdma-perf-03.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon.

ip a | egrep  'roce|ib' 
5: mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mqprio state UP group default qlen 1000
    inet 172.31.40.183/24 brd 172.31.40.255 scope global noprefixroute dynamic mlx5_roce
7: mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:10:8a:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.31.0.183/24 brd 172.31.0.255 scope global noprefixroute dynamic mlx5_ib0
8: mlx5_roce.45@mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    inet 172.31.45.183/24 brd 172.31.45.255 scope global noprefixroute dynamic mlx5_roce.45
9: mlx5_roce.43@mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    inet 172.31.43.183/24 brd 172.31.43.255 scope global noprefixroute dynamic mlx5_roce.43
10: mlx5_ib0.8008@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:11:37:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:08:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.31.8.183/24 brd 172.31.8.255 scope global noprefixroute dynamic mlx5_ib0.8008
11: mlx5_ib0.8002@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:11:b9:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:02:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.31.2.183/24 brd 172.31.2.255 scope global noprefixroute dynamic mlx5_ib0.8002
12: mlx5_ib0.8006@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:12:3b:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:06:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.31.6.183/24 brd 172.31.6.255 scope global noprefixroute dynamic mlx5_ib0.8006
13: mlx5_ib0.8004@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:12:bd:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:04:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.31.4.183/24 brd 172.31.4.255 scope global noprefixroute dynamic mlx5_ib0.8004

Comment 4 zguo 2020-01-22 02:46:35 UTC
No this issue on below test env:

[root@rdma-dev-15 ~]$ rpm -q rdma-core ibacm
rdma-core-22.4-1.el7.x86_64
ibacm-22.4-1.el7.x86_64
[root@rdma-dev-15 ~]$ ip a | egrep 'opa|roce'
3: bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
14: hfi1_opa0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP group default qlen 256
17: hfi1_opa0.8024@hfi1_opa0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc pfifo_fast state LOWERLAYERDOWN group default qlen 256
18: hfi1_opa0.8022@hfi1_opa0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc pfifo_fast state LOWERLAYERDOWN group default qlen 256
23: bnxt_roce.43@bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
24: bnxt_roce.45@bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
[root@rdma-dev-15 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
   Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-01-21 21:44:42 EST; 1min 12s ago
     Docs: man:ibacm
           file:/etc/rdma/ibacm_opts.cfg
 Main PID: 10550 (ibacm)
   CGroup: /system.slice/ibacm.service
           └─10550 /usr/sbin/ibacm --systemd

Jan 21 21:44:41 rdma-dev-15.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon...
Jan 21 21:44:42 rdma-dev-15.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon.

Comment 17 Whitney Chadwick 2020-03-05 13:59:23 UTC
Need clone for this BZ and approved for 7.8 0day

Comment 18 John W. Linville 2020-03-05 18:26:49 UTC
OK -- intent is to approve this for 7.8.z and get the Z-stream update approved for 0day. My interpretation of the required process steps is to remove the 0day indication from the developer whiteboard on this bug and set the 7.8.z flag. When that Z-stream bug is approved, we will than add 0day to the developer whiteboard on the Z-stream bug. If above is incorrect, please illustrate the intended process to follow.

Comment 22 errata-xmlrpc 2020-09-29 19:25:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rdma-core bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3870