Bug 1330719 - dpdk_nic_bind --bind=vfio-pci failed to bind mlx4
Summary: dpdk_nic_bind --bind=vfio-pci failed to bind mlx4
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch-dpdk
Version: 7.3
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Thadeu Lima de Souza Cascardo
QA Contact: Jean-Tsung Hsiao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-26 19:21 UTC by Jean-Tsung Hsiao
Modified: 2018-07-31 06:55 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-29 18:18:06 UTC
Target Upstream Version:


Attachments (Terms of Use)
lspci and dpdk_nic_bind -s (193.60 KB, text/plain)
2016-04-26 21:00 UTC, Jean-Tsung Hsiao
no flags Details
allows any network class device to be consided by dpdk_nic_bind (917 bytes, patch)
2016-04-27 19:42 UTC, Thadeu Lima de Souza Cascardo
no flags Details | Diff

Description Jean-Tsung Hsiao 2016-04-26 19:21:59 UTC
Description of problem: dpdk_nic_bind --bind=vfio-pci failed to bind mlx4

[root@netqe5 dpdk-multique-scripts]# ethtool -i p6p1
driver: mlx4_en
version: 2.2-1 (Feb 2014)
firmware-version: 2.32.5100
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@netqe5 dpdk-multique-scripts]# dpdk_nic_bind --bind=vfio-pci 0000:03:00.0
Unknown device: 0000:03:00.0. Please specify device in "bus:slot.func" format
[root@netqe5 dpdk-multique-scripts]# 


Version-Release number of selected component (if applicable):
[root@netqe5 dpdk-multique-scripts]# rpm -qa | grep dpdk
dpdk-tools-2.2.0-3.el7.x86_64
kernel-kernel-networking-dpdk-only-1.0-4.noarch
dpdk-2.2.0-3.el7.x86_64
openvswitch-dpdk-2.5.0-3.el7.x86_64
[root@netqe5 dpdk-multique-scripts]# uname -a
Linux netqe5.knqe.lab.eng.bos.redhat.com 3.10.0-382.el7.x86_64 #1 SMP Tue Apr 19 13:22:06 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@netqe5 dpdk-multique-scripts]#

How reproducible: reproducible


Steps to Reproduce:
1.install dpdk-2.2.0-3
2.modprobe vfio-pci
3.dpdk_nic_bind --bind=vfio-pci <mlx4 pci bud addr>

Actual results:
failed --- see description above

Expected results:
should succeed

Additional info:

Comment 1 Thadeu Lima de Souza Cascardo 2016-04-26 19:35:28 UTC
Hi, Jean-Tsung.

What is the output of dpdk_nic_bind -s?

Thanks.
Cascardo.

Comment 2 Aaron Conole 2016-04-26 20:00:16 UTC
Please include the output of the following commands:

lspci
lspci -vt
dmesg
dpdk_nic_bind -s

Thanks.

Comment 4 Jean-Tsung Hsiao 2016-04-26 21:00:16 UTC
Created attachment 1151087 [details]
lspci and dpdk_nic_bind -s

see attached log for lspci and "dpdk_nic_bind" info

Comment 5 Aaron Conole 2016-04-26 21:05:29 UTC
Agh, I shouldn't have even needed that information. Sorry.

We don't ship Mellanox with DPDK 2.2, because at that point in time it required non-upstream library changes. I don't know if that is still the case; I will ask and get back to you.

Comment 6 Panu Matilainen 2016-04-27 06:52:01 UTC
Seems to be the case still, neither mlx4 nor mlx5 comes anywhere near compiling with libibverbs 1.2.0 which is supposed to be the latest version.

Comment 7 Panu Matilainen 2016-04-27 09:20:58 UTC
...but actually whether the PMD is shipped or not doesn't even come to play at this stage, dpdk_nic_bind knows nothing about the actual DPDK-side driver.

The actual catch here is that dpdk_nic_bind thinks the Mellanox device doesn't even exist, or at least is not a NIC at all. From our POV it doesn't matter because it wouldn't work anyway but it does suggest there is a bug, perhaps in dpdk_nic_bind.

OTOH if you use driverctl instead of dpdk_nic_bind such issues wont come to play because it doesn't try to be overly clever.

Comment 8 Thadeu Lima de Souza Cascardo 2016-04-27 19:42:30 UTC
Created attachment 1151603 [details]
allows any network class device to be consided by dpdk_nic_bind

This is just to show that we can make dpdk_nic_bind accept other devices as well. In this case, any network devices would be included. On a laptop, this would include a Wifi PCI board, for example. The Mellanox card is a single PCI function that also supports other functions like RoCE, so its configuration is not of an Ethernet class. This patch should work for it too.

As Panu has argued, there is not much point in preventing devices to be bound to vfio-pci. dpdk_nic_bind is just a nice wrapper to verify and change to which driver a device is bound. Binding it to vfio-pci could be done manually as well.

But, please, try this patch and see if it fixes the problem. Maybe it's something DPDK upstream would accept on the basis that mlx4 device requires it.

Cascardo.

Comment 9 Thadeu Lima de Souza Cascardo 2016-04-27 19:43:39 UTC
Hi, Jean-Tsung.

Can you apply the attached patch to the installed version of dpdk_nic_bind and see if that works for you. It's just two lines, you can edit it by hand as well.

Thanks.
Cascardo.

Comment 10 Jean-Tsung Hsiao 2016-04-28 13:41:43 UTC
(In reply to Thadeu Lima de Souza Cascardo from comment #9)
> Hi, Jean-Tsung.
> 
> Can you apply the attached patch to the installed version of dpdk_nic_bind
> and see if that works for you. It's just two lines, you can edit it by hand
> as well.
> 
> Thanks.
> Cascardo.

Hi Cascardo,

Yes, the patch works. But, like bnx2x its dpdk is rejected by ovs-dpdk bridge.

Network devices using DPDK-compatible driver
============================================
0000:03:00.0 'MT27520 Family [ConnectX-3 Pro]' drv=vfio-pci unused=

ovs-vsctl: Error detected while setting up 'dpdk0'.  See ovs-vswitchd log for details.
029d107c-e529-48cf-bca5-36b70b8e3eb8
    Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                error: "could not open network device dpdk0 (No such device)"
        Port "int0"
            Interface "int0"
                type: internal
    ovs_version: "2.5.0"
OFPST_PORT reply (xid=0x2): 2 ports
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=2, bytes=150, drop=2, errs=0, coll=0
  port  1: rx pkts=2, bytes=132, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=0, bytes=0, drop=0, errs=0, coll=0

Comment 11 Thadeu Lima de Souza Cascardo 2016-04-28 17:32:45 UTC
As Aaron and Panu have pointed out, we don't ship the mlx4 driver because it requires unreleased software.

Panu, do you think this patch could fly upstream? Or do you suggest we just ignore it and just recommend driverctl?

Cascardo.

Comment 12 Panu Matilainen 2016-04-29 07:49:16 UTC
Regardless of what we recommend, upstream ought to be interested in the patch because its preventing binding to an otherwise supported (I guess) adapter. Even if others dont care, Mellanox should!

Whether its acceptable like or as an additional option to display all network class adapters instead of just ethernet ones I dunno, both seem quite reasonable to me.

Comment 13 Thadeu Lima de Souza Cascardo 2016-05-06 18:29:31 UTC
Submitted upstream.

http://dpdk.org/ml/archives/dev/2016-May/038562.html

Comment 14 Flavio Leitner 2016-09-29 18:18:06 UTC
Hi,

My understanding is that driverctl can handle this correctly and Thadeu's patch fixing dpdk_nic_bind is merged upstream, so it will land in RHEL at some point.
However, we can't enable mlx driver at this point so I am going to close this bug as I don't see anything else left for us to help.

If any of you disagree please re-open it.
Thanks,
fbl


Note You need to log in before you can comment on or make changes to this bug.