Bug 1793585
| Summary: | [RHEL-7][Regression] ibacm[1221]: segfault at 0 ip 0000000000404f3e sp 00007ffe54e819a0 error 4 in ibacm[400000+e000] | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | zguo <zguo> | |
| Component: | rdma-core | Assignee: | Honggang LI <honli> | |
| Status: | CLOSED ERRATA | QA Contact: | zguo <zguo> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.8 | CC: | honli, linville, mschmidt, rdma-dev-team, wchadwic | |
| Target Milestone: | rc | Keywords: | Regression, ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | rdma-core-22.4-2.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1793736 1812444 (view as bug list) | Environment: | ||
| Last Closed: | 2020-09-29 19:25:25 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1793736, 1812444 | |||
It works well on cxgb4 with ibacm-22.4-1.el7.x86_64. So the scenario in bug description is the only case I hit this issue so far.
[root@rdma-dev-13 ~]$ rpm -q ibacm
ibacm-22.4-1.el7.x86_64
[root@rdma-dev-13 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2020-01-21 11:02:16 EST; 1min 32s ago
Docs: man:ibacm
file:/etc/rdma/ibacm_opts.cfg
Process: 10487 ExecStart=/usr/sbin/ibacm --systemd (code=exited, status=255)
Main PID: 10487 (code=exited, status=255)
Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon...
Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: ibacm.service: main process exited, code=exited, status=255/n/a
Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: Failed to start InfiniBand Address Cache Manager Daemon.
Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: Unit ibacm.service entered failed state.
Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: ibacm.service failed.
No this issue on rdma-perf-03 with mlx5 IB and mlx5 RoCE.
[root@rdma-perf-03 ~]$ rpm -q ibacm rdma-core
ibacm-22.4-1.el7.x86_64
rdma-core-22.4-1.el7.x86_64
[root@rdma-perf-03 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:ibacm
file:/etc/rdma/ibacm_opts.cfg
[root@rdma-perf-03 ~]$ systemctl start ibacm
[root@rdma-perf-03 ~]$ dmesg
[root@rdma-perf-03 ~]$ dmesg
[root@rdma-perf-03 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2020-01-21 21:26:55 EST; 8s ago
Docs: man:ibacm
file:/etc/rdma/ibacm_opts.cfg
Main PID: 11676 (ibacm)
CGroup: /system.slice/ibacm.service
└─11676 /usr/sbin/ibacm --systemd
Jan 21 21:26:55 rdma-perf-03.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon...
Jan 21 21:26:55 rdma-perf-03.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon.
ip a | egrep 'roce|ib'
5: mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mqprio state UP group default qlen 1000
inet 172.31.40.183/24 brd 172.31.40.255 scope global noprefixroute dynamic mlx5_roce
7: mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256
link/infiniband 00:00:10:8a:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 172.31.0.183/24 brd 172.31.0.255 scope global noprefixroute dynamic mlx5_ib0
8: mlx5_roce.45@mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
inet 172.31.45.183/24 brd 172.31.45.255 scope global noprefixroute dynamic mlx5_roce.45
9: mlx5_roce.43@mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
inet 172.31.43.183/24 brd 172.31.43.255 scope global noprefixroute dynamic mlx5_roce.43
10: mlx5_ib0.8008@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
link/infiniband 00:00:11:37:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:08:00:00:00:00:00:00:ff:ff:ff:ff
inet 172.31.8.183/24 brd 172.31.8.255 scope global noprefixroute dynamic mlx5_ib0.8008
11: mlx5_ib0.8002@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256
link/infiniband 00:00:11:b9:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:02:00:00:00:00:00:00:ff:ff:ff:ff
inet 172.31.2.183/24 brd 172.31.2.255 scope global noprefixroute dynamic mlx5_ib0.8002
12: mlx5_ib0.8006@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
link/infiniband 00:00:12:3b:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:06:00:00:00:00:00:00:ff:ff:ff:ff
inet 172.31.6.183/24 brd 172.31.6.255 scope global noprefixroute dynamic mlx5_ib0.8006
13: mlx5_ib0.8004@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256
link/infiniband 00:00:12:bd:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:04:00:00:00:00:00:00:ff:ff:ff:ff
inet 172.31.4.183/24 brd 172.31.4.255 scope global noprefixroute dynamic mlx5_ib0.8004
No this issue on below test env:
[root@rdma-dev-15 ~]$ rpm -q rdma-core ibacm
rdma-core-22.4-1.el7.x86_64
ibacm-22.4-1.el7.x86_64
[root@rdma-dev-15 ~]$ ip a | egrep 'opa|roce'
3: bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
14: hfi1_opa0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP group default qlen 256
17: hfi1_opa0.8024@hfi1_opa0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc pfifo_fast state LOWERLAYERDOWN group default qlen 256
18: hfi1_opa0.8022@hfi1_opa0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc pfifo_fast state LOWERLAYERDOWN group default qlen 256
23: bnxt_roce.43@bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
24: bnxt_roce.45@bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
[root@rdma-dev-15 ~]$ systemctl status ibacm
● ibacm.service - InfiniBand Address Cache Manager Daemon
Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2020-01-21 21:44:42 EST; 1min 12s ago
Docs: man:ibacm
file:/etc/rdma/ibacm_opts.cfg
Main PID: 10550 (ibacm)
CGroup: /system.slice/ibacm.service
└─10550 /usr/sbin/ibacm --systemd
Jan 21 21:44:41 rdma-dev-15.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon...
Jan 21 21:44:42 rdma-dev-15.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon.
Need clone for this BZ and approved for 7.8 0day OK -- intent is to approve this for 7.8.z and get the Z-stream update approved for 0day. My interpretation of the required process steps is to remove the 0day indication from the developer whiteboard on this bug and set the 7.8.z flag. When that Z-stream bug is approved, we will than add 0day to the developer whiteboard on the Z-stream bug. If above is incorrect, please illustrate the intended process to follow. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (rdma-core bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3870 |
Description of problem: It works on ibacm-22.3-1.el7.x86_64, but failed on ibacm-22.4-1.el7.x86_64 on the same test bed(rdma-perf-00/01). It might be caused by the recent change of rdma-core. Hit this segfault only on non-IB hardware. rdma-perf-01 are with both mlx4 IB and mlx4 RoCE device. ============= [root@rdma-perf-01 ~]$ rpm -q ibacm rdma-core ibacm-22.3-1.el7.x86_64 rdma-core-22.3-1.el7.x86_64 [root@rdma-perf-01 ~]$ systemctl restart ibacm [root@rdma-perf-01 ~]$ systemctl status ibacm ● ibacm.service - InfiniBand Address Cache Manager Daemon Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2020-01-21 10:34:51 EST; 8s ago Docs: man:ibacm file:/etc/rdma/ibacm_opts.cfg Main PID: 850 (ibacm) Tasks: 4 CGroup: /system.slice/ibacm.service └─850 /usr/sbin/ibacm --systemd Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon... Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon. ============ [root@rdma-perf-01 ~]$ rpm -q ibacm rdma-core ibacm-22.4-1.el7.x86_64 rdma-core-22.4-1.el7.x86_64 [root@rdma-perf-01 ~]$ systemctl status ibacm ● ibacm.service - InfiniBand Address Cache Manager Daemon Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled) Active: inactive (dead) Docs: man:ibacm file:/etc/rdma/ibacm_opts.cfg Jan 21 10:29:55 rdma-perf-01.lab.bos.redhat.com systemd[1]: Unit ibacm.service entered failed state. Jan 21 10:29:55 rdma-perf-01.lab.bos.redhat.com systemd[1]: ibacm.service failed. Jan 21 10:33:48 rdma-perf-01.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon... Jan 21 10:33:48 rdma-perf-01.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon. Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Stopping InfiniBand Address Cache Manager Daemon... Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Stopped InfiniBand Address Cache Manager Daemon. Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon... Jan 21 10:34:51 rdma-perf-01.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon. Jan 21 10:35:47 rdma-perf-01.lab.bos.redhat.com systemd[1]: Stopping InfiniBand Address Cache Manager Daemon... Jan 21 10:35:47 rdma-perf-01.lab.bos.redhat.com systemd[1]: Stopped InfiniBand Address Cache Manager Daemon. [root@rdma-perf-01 ~]$ systemctl start ibacm Job for ibacm.service failed because a fatal signal was delivered to the control process. See "systemctl status ibacm.service" and "journalctl -xe" for details. [root@rdma-perf-01 ~]$ dmesg [20013.737212] ibacm[1221]: segfault at 0 ip 0000000000404f3e sp 00007ffe54e819a0 error 4 in ibacm[400000+e000] ========== Actual results: ibacm[1221]: segfault at 0 ip 0000000000404f3e sp 00007ffe54e819a0 error 4 in ibacm[400000+e000] on machine having non-IB hardware. Expected results: No this segfault on non-IB hardware. Additional info: