Bug 1634159

Summary: Enable support for Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD on power
Product: Red Hat Enterprise Linux 7 Reporter: David J. Wilder <wilder>
Component: openvswitch2.10Assignee: Timothy Redaelli <tredaelli>
Status: CLOSED CURRENTRELEASE QA Contact: Ping Zhang <pizhang>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: ahleihel, atragler, ctrautma, kzhang, linville, mleitner, noas, qding, rkhan, tredaelli, wilder
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openvswitch2.10-2.10.0-10.el7fdb.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-20 15:29:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
included to work around the lack of off_t definition for mlx5dv.h.
none
ovs-vswitchd.log, from ovs-vswithd crash. none

Description David J. Wilder 2018-09-28 20:26:17 UTC
Created attachment 1488228 [details]
included to work around the lack of off_t definition for mlx5dv.h.

Description of problem:

Please enable support for Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD in dpdk config for power.

Version-Release number of selected component (if applicable):
openvswitch 2.10 with dpdk 17.11

How reproducible:
Run testpmd with a CX4 or CX5 adapter on any power system.

Actual results:
PMD is not supported.

Please update the file ppc_64-power8-linuxapp-gcc-config to enable this support:
<....>
# Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD
CONFIG_RTE_LIBRTE_MLX5_PMD=y
CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS=y
CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8
<....>

Apply the attached patch to correct build issue on power when MLX support is enabled.

Comment 2 Rashid Khan 2018-10-01 15:57:13 UTC
Hi Alaa
Can you please work with David and Tim to resolve this.

Comment 3 Alaa Hleihel (NVIDIA Mellanox) 2018-10-01 17:26:23 UTC
Hi David,

(In reply to David J. Wilder from comment #0)
> Apply the attached patch to correct build issue on power when MLX support is
> enabled.

Which RHEL and rdma-core versions did you use ?
This issue should not happen with rdma-core v17. It was already fixed (mlx5dv.h will include <sys/types.h>).

Thanks,
Alaa

Comment 4 David J. Wilder 2018-10-03 23:49:05 UTC
(In reply to Alaa Hleihel from comment #3)
> Hi David,
> 
> (In reply to David J. Wilder from comment #0)
> > Apply the attached patch to correct build issue on power when MLX support is
> > enabled.
> 
> Which RHEL and rdma-core versions did you use ?
> This issue should not happen with rdma-core v17. It was already fixed
> (mlx5dv.h will include <sys/types.h>).
> 
> Thanks,
> Alaa

Hi Alaa
You are correct,  I upgraded rdma-core to v17.2 and openvswitch built with out my patch and MLX5 enabled.

Thanks
David

Comment 5 Noa Spanier 2018-10-15 14:11:45 UTC
Hi Anita,

Following our discussion, we will assign the BZ to Red Hat, to make progress with this request.

Regards,
Noa

Comment 6 Marcelo Ricardo Leitner 2018-10-16 15:35:54 UTC
Tim, assigning to you as this are "just" config changes now, assuming David will help test the updated packages.
rdma-core on RHEL7 is already at the version mentioned in comment #4.
I don't have much experience with ppc64 but please let me know if I can help anyhow. Thanks

Comment 9 David J. Wilder 2018-10-18 20:58:15 UTC
Thanks Timothy,  I will test it.

Comment 10 David J. Wilder 2018-10-25 00:14:44 UTC
I am hitting a ovs-vswitchd crash with openvswitch2.10  on p9.
The problem happens when running the PVP test.

gdb -c core
....
Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fff98210ed0 in mlx5_tx_complete (txq=0x7ff88d88c380)
    at /usr/src/debug/openvswitch-2.10.0/dpdk-17.11/drivers/net/mlx5/mlx5_rxtx.h:481
481					free[blk_n++] = m;

(gdb) list
476		/* Free buffers. */
477		while (elts_free != elts_tail) {
478			m = rte_pktmbuf_prefree_seg((*txq->elts)[elts_free++ & elts_m]);
479			if (likely(m != NULL)) {
480				if (likely(m->pool == pool)) {
481					free[blk_n++] = m;
482				} else {
483					if (likely(pool != NULL))
484						rte_mempool_put_bulk(pool,
485								     (void *)free,

(gdb) print blk_n
$1 = 3241
(gdb) print elts_free
$2 = 916
(gdb) print elts_tail
$3 = 58585
(gdb) print elts_m
$4 = <optimized out>
(gdb) print *txq->elts
Cannot access memory at address 0x7ff88d88c408

I am working with Mellonox on another issue in the same area of the code when running testpmd on the host, it might be related.  I will do more debugging to see if the two problems match.  

Here is some some configuration data.

# rpm -qa | grep openvswitch
openvswitch2.10-2.10.0-10.el7fdb.1.ppc64le
openvswitch2.10-debuginfo-2.10.0-10.el7fdb.1.ppc64le
[root@ltc17u31 /]# rpm -qa | grep rdma-core
rdma-core-15-7.el7_5.ppc64le
rdma-core-devel-15-7.el7_5.ppc64le
[root@ltc17u31 /]# rpm -qa | grep ibacm
ibacm-15-7.el7_5.ppc64le
[root@ltc17u31 /]# rpm -qa | grep libibcm
libibcm-15-7.el7_5.ppc64le
[root@ltc17u31 /]# rpm -qa | grep libibumad
libibumad-15-7.el7_5.ppc64le
[root@ltc17u31 /]# rpm -qa | grep libibverbs
libibverbs-15-7.el7_5.ppc64le
libibverbs-utils-15-7.el7_5.ppc64le
[root@ltc17u31 /]# rpm -qa | grep librdmacm
librdmacm-15-7.el7_5.ppc64le

[root@ltc17u31 /]# uname -a
Linux ltc17u31 4.14.0-49.el7a.ppc64le #1 SMP Wed Mar 14 13:58:40 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

[root@ltc17u31 /]# ovs-vsctl show
2adc7953-e1bf-4ec7-b39b-42797a141e72
    Bridge "ovs_pvp_br0"
        fail_mode: secure
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.0", n_rxq="2"}
        Port "ovs_pvp_br0"
            Interface "ovs_pvp_br0"
                type: internal
        Port "vhost0"
            Interface "vhost0"
                type: dpdkvhostuserclient
                options: {n_rxq="2", vhost-server-path="/tmp/vhost-sock0"}
    ovs_version: "2.10.0"

-
dmesg
[ 5636.703686] pmd18[12905]: unhandled signal 11 at 00007fff9ef20000 nip 00007fffa9e60ed0 lr 00007fffa9b1df40 code 1

Comment 11 David J. Wilder 2018-10-25 00:16:05 UTC
Created attachment 1497254 [details]
ovs-vswitchd.log, from ovs-vswithd crash.

Comment 12 Marcelo Ricardo Leitner 2019-02-18 16:08:47 UTC
(In reply to David J. Wilder from comment #10)
> I am hitting a ovs-vswitchd crash with openvswitch2.10  on p9.
> The problem happens when running the PVP test.

Alaa, need your help here.

Comment 13 David J. Wilder 2019-02-18 18:19:13 UTC
This is an issue with dpdk and the Mellanox pmd. It is being worked by Mellanox.  RHEL is not currently supporting dpdk on power, so this issue is not blocking OVS.

Comment 14 Alaa Hleihel (NVIDIA Mellanox) 2019-02-19 14:40:46 UTC
(In reply to David J. Wilder from comment #13)
> This is an issue with dpdk and the Mellanox pmd. It is being worked by
> Mellanox.  RHEL is not currently supporting dpdk on power, so this issue is
> not blocking OVS.

Thanks David for the update.
BTW, what is the Mellanox support case number ?

Regards,
Alaa

Comment 15 Timothy Redaelli 2020-07-20 15:29:03 UTC
FDB is not released using errata