Bug 1793585
Summary: | [RHEL-7][Regression] ibacm[1221]: segfault at 0 ip 0000000000404f3e sp 00007ffe54e819a0 error 4 in ibacm[400000+e000] | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | zguo <zguo> | |
Component: | rdma-core | Assignee: | Honggang LI <honli> | |
Status: | CLOSED ERRATA | QA Contact: | zguo <zguo> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 7.8 | CC: | honli, linville, mschmidt, rdma-dev-team, wchadwic | |
Target Milestone: | rc | Keywords: | Regression, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | rdma-core-22.4-2.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1793736 1812444 (view as bug list) | Environment: | ||
Last Closed: | 2020-09-29 19:25:25 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1793736, 1812444 |
Description
zguo
2020-01-21 15:46:38 UTC
It works well on cxgb4 with ibacm-22.4-1.el7.x86_64. So the scenario in bug description is the only case I hit this issue so far. [root@rdma-dev-13 ~]$ rpm -q ibacm ibacm-22.4-1.el7.x86_64 [root@rdma-dev-13 ~]$ systemctl status ibacm ● ibacm.service - InfiniBand Address Cache Manager Daemon Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Tue 2020-01-21 11:02:16 EST; 1min 32s ago Docs: man:ibacm file:/etc/rdma/ibacm_opts.cfg Process: 10487 ExecStart=/usr/sbin/ibacm --systemd (code=exited, status=255) Main PID: 10487 (code=exited, status=255) Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon... Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: ibacm.service: main process exited, code=exited, status=255/n/a Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: Failed to start InfiniBand Address Cache Manager Daemon. Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: Unit ibacm.service entered failed state. Jan 21 11:02:16 rdma-dev-13.lab.bos.redhat.com systemd[1]: ibacm.service failed. No this issue on rdma-perf-03 with mlx5 IB and mlx5 RoCE. [root@rdma-perf-03 ~]$ rpm -q ibacm rdma-core ibacm-22.4-1.el7.x86_64 rdma-core-22.4-1.el7.x86_64 [root@rdma-perf-03 ~]$ systemctl status ibacm ● ibacm.service - InfiniBand Address Cache Manager Daemon Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled) Active: inactive (dead) Docs: man:ibacm file:/etc/rdma/ibacm_opts.cfg [root@rdma-perf-03 ~]$ systemctl start ibacm [root@rdma-perf-03 ~]$ dmesg [root@rdma-perf-03 ~]$ dmesg [root@rdma-perf-03 ~]$ systemctl status ibacm ● ibacm.service - InfiniBand Address Cache Manager Daemon Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2020-01-21 21:26:55 EST; 8s ago Docs: man:ibacm file:/etc/rdma/ibacm_opts.cfg Main PID: 11676 (ibacm) CGroup: /system.slice/ibacm.service └─11676 /usr/sbin/ibacm --systemd Jan 21 21:26:55 rdma-perf-03.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon... Jan 21 21:26:55 rdma-perf-03.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon. ip a | egrep 'roce|ib' 5: mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mqprio state UP group default qlen 1000 inet 172.31.40.183/24 brd 172.31.40.255 scope global noprefixroute dynamic mlx5_roce 7: mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256 link/infiniband 00:00:10:8a:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 172.31.0.183/24 brd 172.31.0.255 scope global noprefixroute dynamic mlx5_ib0 8: mlx5_roce.45@mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 inet 172.31.45.183/24 brd 172.31.45.255 scope global noprefixroute dynamic mlx5_roce.45 9: mlx5_roce.43@mlx5_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 inet 172.31.43.183/24 brd 172.31.43.255 scope global noprefixroute dynamic mlx5_roce.43 10: mlx5_ib0.8008@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256 link/infiniband 00:00:11:37:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:08:00:00:00:00:00:00:ff:ff:ff:ff inet 172.31.8.183/24 brd 172.31.8.255 scope global noprefixroute dynamic mlx5_ib0.8008 11: mlx5_ib0.8002@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256 link/infiniband 00:00:11:b9:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:02:00:00:00:00:00:00:ff:ff:ff:ff inet 172.31.2.183/24 brd 172.31.2.255 scope global noprefixroute dynamic mlx5_ib0.8002 12: mlx5_ib0.8006@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256 link/infiniband 00:00:12:3b:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:06:00:00:00:00:00:00:ff:ff:ff:ff inet 172.31.6.183/24 brd 172.31.6.255 scope global noprefixroute dynamic mlx5_ib0.8006 13: mlx5_ib0.8004@mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256 link/infiniband 00:00:12:bd:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:19:6c brd 00:ff:ff:ff:ff:12:40:1b:80:04:00:00:00:00:00:00:ff:ff:ff:ff inet 172.31.4.183/24 brd 172.31.4.255 scope global noprefixroute dynamic mlx5_ib0.8004 No this issue on below test env: [root@rdma-dev-15 ~]$ rpm -q rdma-core ibacm rdma-core-22.4-1.el7.x86_64 ibacm-22.4-1.el7.x86_64 [root@rdma-dev-15 ~]$ ip a | egrep 'opa|roce' 3: bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000 14: hfi1_opa0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast state UP group default qlen 256 17: hfi1_opa0.8024@hfi1_opa0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc pfifo_fast state LOWERLAYERDOWN group default qlen 256 18: hfi1_opa0.8022@hfi1_opa0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc pfifo_fast state LOWERLAYERDOWN group default qlen 256 23: bnxt_roce.43@bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 24: bnxt_roce.45@bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 [root@rdma-dev-15 ~]$ systemctl status ibacm ● ibacm.service - InfiniBand Address Cache Manager Daemon Loaded: loaded (/usr/lib/systemd/system/ibacm.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2020-01-21 21:44:42 EST; 1min 12s ago Docs: man:ibacm file:/etc/rdma/ibacm_opts.cfg Main PID: 10550 (ibacm) CGroup: /system.slice/ibacm.service └─10550 /usr/sbin/ibacm --systemd Jan 21 21:44:41 rdma-dev-15.lab.bos.redhat.com systemd[1]: Starting InfiniBand Address Cache Manager Daemon... Jan 21 21:44:42 rdma-dev-15.lab.bos.redhat.com systemd[1]: Started InfiniBand Address Cache Manager Daemon. Need clone for this BZ and approved for 7.8 0day OK -- intent is to approve this for 7.8.z and get the Z-stream update approved for 0day. My interpretation of the required process steps is to remove the 0day indication from the developer whiteboard on this bug and set the 7.8.z flag. When that Z-stream bug is approved, we will than add 0day to the developer whiteboard on the Z-stream bug. If above is incorrect, please illustrate the intended process to follow. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (rdma-core bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3870 |