DescriptionTimothy Redaelli
2020-06-26 09:34:23 UTC
+++ This bug was initially created as a clone of Bug #1851170 +++
+++ This bug was initially created as a clone of Bug #1851169 +++
+++ This bug was initially created as a clone of Bug #1850163 +++
Description of problem:
unable to run testpmd inside a container with latest dpdk version 18.11
Interface details:
ethtool -i ens1f0
driver: i40e
version: 2.8.20-k
firmware-version: 6.00 0x800036cb 1.1747.0
expansion-rom-version:
bus-info: 0000:12:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
12:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
12:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
lspci -v -nn -mm -s 0000:12:02.0
Slot: 12:02.0
Class: Ethernet controller [0200]
Vendor: Intel Corporation [8086]
Device: Ethernet Virtual Function 700 Series [154c]
SVendor: Hewlett Packard Enterprise [1590]
SDevice: Device [0000]
Rev: 02
NUMANode: 0
running the testpmd application
strace -ff -o /tmp/testpmd.strace testpmd -l ${CPU} -w ${PCIDEVICE_OPENSHIFT_IO_DPDKNIC} --iova-mode=va --log-level="*:debug" -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 8 on socket 0
EAL: Detected lcore 6 as core 9 on socket 0
EAL: Detected lcore 7 as core 10 on socket 0
EAL: Detected lcore 8 as core 11 on socket 0
EAL: Detected lcore 9 as core 12 on socket 0
EAL: Detected lcore 10 as core 16 on socket 0
EAL: Detected lcore 11 as core 17 on socket 0
EAL: Detected lcore 12 as core 18 on socket 0
EAL: Detected lcore 13 as core 19 on socket 0
EAL: Detected lcore 14 as core 20 on socket 0
EAL: Detected lcore 15 as core 24 on socket 0
EAL: Detected lcore 16 as core 25 on socket 0
EAL: Detected lcore 17 as core 26 on socket 0
EAL: Detected lcore 18 as core 27 on socket 0
EAL: Detected lcore 19 as core 28 on socket 0
EAL: Detected lcore 20 as core 0 on socket 1
EAL: Detected lcore 21 as core 1 on socket 1
EAL: Detected lcore 22 as core 2 on socket 1
EAL: Detected lcore 23 as core 3 on socket 1
EAL: Detected lcore 24 as core 4 on socket 1
EAL: Detected lcore 25 as core 8 on socket 1
EAL: Detected lcore 26 as core 9 on socket 1
EAL: Detected lcore 27 as core 10 on socket 1
EAL: Detected lcore 28 as core 11 on socket 1
EAL: Detected lcore 29 as core 12 on socket 1
EAL: Detected lcore 30 as core 16 on socket 1
EAL: Detected lcore 31 as core 17 on socket 1
EAL: Detected lcore 32 as core 18 on socket 1
EAL: Detected lcore 33 as core 19 on socket 1
EAL: Detected lcore 34 as core 20 on socket 1
EAL: Detected lcore 35 as core 24 on socket 1
EAL: Detected lcore 36 as core 25 on socket 1
EAL: Detected lcore 37 as core 26 on socket 1
EAL: Detected lcore 38 as core 27 on socket 1
EAL: Detected lcore 39 as core 28 on socket 1
EAL: Detected lcore 40 as core 0 on socket 0
EAL: Detected lcore 41 as core 1 on socket 0
EAL: Detected lcore 42 as core 2 on socket 0
EAL: Detected lcore 43 as core 3 on socket 0
EAL: Detected lcore 44 as core 4 on socket 0
EAL: Detected lcore 45 as core 8 on socket 0
EAL: Detected lcore 46 as core 9 on socket 0
EAL: Detected lcore 47 as core 10 on socket 0
EAL: Detected lcore 48 as core 11 on socket 0
EAL: Detected lcore 49 as core 12 on socket 0
EAL: Detected lcore 50 as core 16 on socket 0
EAL: Detected lcore 51 as core 17 on socket 0
EAL: Detected lcore 52 as core 18 on socket 0
EAL: Detected lcore 53 as core 19 on socket 0
EAL: Detected lcore 54 as core 20 on socket 0
EAL: Detected lcore 55 as core 24 on socket 0
EAL: Detected lcore 56 as core 25 on socket 0
EAL: Detected lcore 57 as core 26 on socket 0
EAL: Detected lcore 58 as core 27 on socket 0
EAL: Detected lcore 59 as core 28 on socket 0
EAL: Detected lcore 60 as core 0 on socket 1
EAL: Detected lcore 61 as core 1 on socket 1
EAL: Detected lcore 62 as core 2 on socket 1
EAL: Detected lcore 63 as core 3 on socket 1
EAL: Detected lcore 64 as core 4 on socket 1
EAL: Detected lcore 65 as core 8 on socket 1
EAL: Detected lcore 66 as core 9 on socket 1
EAL: Detected lcore 67 as core 10 on socket 1
EAL: Detected lcore 68 as core 11 on socket 1
EAL: Detected lcore 69 as core 12 on socket 1
EAL: Detected lcore 70 as core 16 on socket 1
EAL: Detected lcore 71 as core 17 on socket 1
EAL: Detected lcore 72 as core 18 on socket 1
EAL: Detected lcore 73 as core 19 on socket 1
EAL: Detected lcore 74 as core 20 on socket 1
EAL: Detected lcore 75 as core 24 on socket 1
EAL: Detected lcore 76 as core 25 on socket 1
EAL: Detected lcore 77 as core 26 on socket 1
EAL: Detected lcore 78 as core 27 on socket 1
EAL: Detected lcore 79 as core 28 on socket 1
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 80 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_failsafe.so.1
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx4.so.1
EAL: Mem event callback 'MLX4_MEM_EVENT_CB:(nil)' registered
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vhost.so.2
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vdev_netvsc.so.1
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_virtio.so.1
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_tap.so.1
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_qede.so.1
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_enic.so.1
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_i40e.so.2
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_netvsc.so.1
EAL: Registered [vmbus] bus.
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx5.so.1
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_e1000.so.1
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_nfp.so.1
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_bnxt.so.2
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ring.so.2
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ixgbe.so.2
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: VFIO PCI modules not loaded
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: IOMMU type 1 (Type 1) is supported
EAL: IOMMU type 7 (sPAPR) is not supported
EAL: IOMMU type 8 (No-IOMMU) is not supported
EAL: VFIO support initialized
EAL: Ask a virtual area of 0x2e000 bytes
EAL: Virtual area found at 0x100000000 (size = 0x2e000)
EAL: Setting up physically contiguous memory...
EAL: Setting maximum number of open files to 1048576
EAL: Detected memory type: socket_id:0 hugepage_sz:1073741824
EAL: Detected memory type: socket_id:1 hugepage_sz:1073741824
EAL: Creating 4 segment lists: n_segs:32 socket_id:0 hugepage_sz:1073741824
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x10002e000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 0
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x140000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x940000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 0
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x980000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x1180000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 0
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x11c0000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x19c0000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 0
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x1a00000000 (size = 0x800000000)
EAL: Creating 4 segment lists: n_segs:32 socket_id:1 hugepage_sz:1073741824
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x2200000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 1
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x2240000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x2a40000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 1
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x2a80000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x3280000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 1
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x32c0000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x3ac0000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 1
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x3b00000000 (size = 0x800000000)
EAL: TSC frequency is ~2095071 KHz
EAL: Master lcore 10 is ready (tid=7fce1d7168c0;cpuset=[10])
EAL: lcore 11 is ready (tid=7fce1489c700;cpuset=[11])
EAL: lcore 50 is ready (tid=7fce1409b700;cpuset=[50])
EAL: lcore 51 is ready (tid=7fce1389a700;cpuset=[51])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: Calling mem event callback 'MLX4_MEM_EVENT_CB:(nil)'
EAL: request: mp_malloc_sync
EAL: Heap on socket 0 was expanded by 1024MB
EAL: PCI device 0000:12:02.0 on NUMA socket 0
EAL: probe driver: 8086:154c net_i40e_vf
EAL: using IOMMU type 1 (Type 1)
EAL: Mem event callback 'vfio_mem_event_clb:(nil)' registered
EAL: Installed memory event callback for VFIO
EAL: VFIO reports MSI-X BAR as mappable
EAL: PCI memory mapped at 0x4300000000
EAL: PCI memory mapped at 0x4300010000
i40evf_dev_init(): >>
i40e_set_mac_type(): i40e_set_mac_type
i40e_set_mac_type(): i40e_set_mac_type found mac: 2, returns: 0
Bus error (core dumped)
versions:
yum list installed | grep dpdk
dpdk.x86_64 18.11.2-3.el8 @rhel-8-for-x86_64-appstream-rpms
dpdk-devel.x86_64 18.11.2-3.el8 @rhel-8-for-x86_64-appstream-rpms
dpdk-tools.x86_64 18.11.2-3.el8 @rhel-8-for-x86_64-appstream-rpms
--- Additional comment from Maxime Coquelin on 2020-06-24 17:06:21 CEST ---
Backtrace without the debuginfo installed:
Thread 1 "testpmd" received signal SIGBUS, Bus error.
0x00007ffff78f6a27 in i40evf_check_vf_reset_done.isra () from /lib64/librte_pmd_i40e.so.2
(gdb) bt
#0 0x00007ffff78f6a27 in i40evf_check_vf_reset_done.isra ()
from /lib64/librte_pmd_i40e.so.2
#1 0x00007ffff78f809e in i40evf_dev_init () from /lib64/librte_pmd_i40e.so.2
#2 0x00007ffff79292b5 in eth_i40evf_pci_probe () from /lib64/librte_pmd_i40e.so.2
#3 0x00007ffff44195f2 in pci_probe_all_drivers.cold () from /lib64/librte_bus_pci.so.2
#4 0x00007ffff441cfd2 in rte_pci_probe () from /lib64/librte_bus_pci.so.2
#5 0x00007ffff4a4e773 in rte_bus_probe () from /lib64/librte_eal.so.9
#6 0x00007ffff4a38dd4 in rte_eal_init.cold () from /lib64/librte_eal.so.9
#7 0x00005555555a2103 in main ()
--- Additional comment from Maxime Coquelin on 2020-06-24 17:07:55 CEST ---
(gdb) disas
Dump of assembler code for function i40evf_check_vf_reset_done.isra.6:
0x00007ffff78f6a18 <+0>: push %rbp
0x00007ffff78f6a19 <+1>: mov $0x14,%ebp
0x00007ffff78f6a1e <+6>: push %rbx
0x00007ffff78f6a1f <+7>: push %rcx
0x00007ffff78f6a20 <+8>: mov 0x60(%rdi),%rbx
0x00007ffff78f6a24 <+12>: mov (%rbx),%rax
=> 0x00007ffff78f6a27 <+15>: mov 0x8800(%rax),%eax
0x00007ffff78f6a2d <+21>: and $0x3,%eax
0x00007ffff78f6a30 <+24>: dec %eax
0x00007ffff78f6a32 <+26>: cmp $0x1,%eax
0x00007ffff78f6a35 <+29>: jbe 0x7ffff78f6a4e <i40evf_check_vf_reset_done.isra.6+54>
0x00007ffff78f6a37 <+31>: mov 0x277542(%rip),%rax # 0x7ffff7b6df80
0x00007ffff78f6a3e <+38>: mov $0xc350,%edi
0x00007ffff78f6a43 <+43>: callq *(%rax)
0x00007ffff78f6a45 <+45>: dec %ebp
0x00007ffff78f6a47 <+47>: jne 0x7ffff78f6a24 <i40evf_check_vf_reset_done.isra.6+12>
0x00007ffff78f6a49 <+49>: or $0xffffffff,%eax
0x00007ffff78f6a4c <+52>: jmp 0x7ffff78f6a5f <i40evf_check_vf_reset_done.isra.6+71>
0x00007ffff78f6a4e <+54>: andw $0xfffd,0xa58(%rbx)
0x00007ffff78f6a56 <+62>: xor %eax,%eax
0x00007ffff78f6a58 <+64>: movb $0x0,0xa4c(%rbx)
0x00007ffff78f6a5f <+71>: pop %rdx
0x00007ffff78f6a60 <+72>: pop %rbx
0x00007ffff78f6a61 <+73>: pop %rbp
0x00007ffff78f6a62 <+74>: retq
End of assembler dump.
(gdb) info registers
rax 0x4300000000 287762808832
rbx 0x17ffb1440 6442128448
rcx 0x7fffffffe430 140737488348208
rdx 0x0 0
rsi 0x25 37
rdi 0x17ffb3600 6442137088
rbp 0x14 0x14
rsp 0x7fffffffe510 0x7fffffffe510
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x17ffb1440 6442128448
r13 0x7ffff7b6fd44 140737349352772
r14 0x7ffff441e968 140737291348328
r15 0x7ffff4cf6430 140737300620336
rip 0x7ffff78f6a27 0x7ffff78f6a27 <i40evf_check_vf_reset_done.isra+15>
eflags 0x10246 [ PF ZF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
k0 0x0 0
k1 0x0 0
k2 0x0 0
k3 0x0 0
k4 0x0 0
k5 0x0 0
k6 0x0 0
k7 0x0 0
--- Additional comment from Maxime Coquelin on 2020-06-24 18:58:35 CEST ---
It crashes while trying to access the PCI memory at 0x4300000000.
With GDB we can confirm this memory area, and also the other one at 0x4300010000 aren't accessible.
We also tried with adding CAP_SYS_ADMIN:
sh-4.4# capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_admin,cap_sys_resource+eip
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_admin,cap_sys_resource
Ambient set =
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=0(root)
gid=0(root)
groups=
but it still fails.
--- Additional comment from Maxime Coquelin on 2020-06-24 19:29:32 CEST ---
We think we have found the reason of the failure.
The kernel has been updated recently to kernel-4.18.0-193.9.1.el8_2,
which contains a fix for CVE-2020-12888.
With this CVE fix, DPDK application using VFIO will fail.
A fix has been posted upstream 3 days ago, but is not merged yet:
https://patchwork.dpdk.org/patch/71962/
I will prepare you a scratch build with the DPDK patch backported.
--- Additional comment from Maxime Coquelin on 2020-06-24 20:10:56 CEST ---
Scratch build available here for testing:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=29669855
--- Additional comment from Sebastian Scheinkman on 2020-06-24 20:31:15 CEST ---
After installing the scratch build testpmd is working again
dpdk-18.11.2-4.el8_1.bz1850163.x86_64
dpdk-devel-18.11.2-4.el8_1.bz1850163.x86_64
dpdk-tools-18.11.2-4.el8_1.bz1850163.x86_64
Complete!
sh-4.4# export CPU=$(cat /sys/fs/cgroup/cpuset/cpuset.cpus)
sh-4.4# echo ${CPU}
8-9,48-49
sh-4.4# echo ${PCIDEVICE_OPENSHIFT_IO_DPDKNIC}
0000:12:02.4
sh-4.4#
sh-4.4# testpmd -l ${CPU} -w ${PCIDEVICE_OPENSHIFT_IO_DPDKNIC} --iova-mode=va -- -i
--portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall
EAL: Detected 80 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:12:02.4 on NUMA socket 0
EAL: probe driver: 8086:154c net_i40e_vf
EAL: using IOMMU type 1 (Type 1)
Interactive-mode selected
Set mac packet forwarding mode
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 3A:C9:AA:4B:E8:8C
Checking link statuses...
Done
testpmd>
testpmd>
testpmd> start
mac packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP al$ocation mode: native
Logical Core 9 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
mac packet forwarding packets/burst=32
nb forwarding cores=2 - nb forwarding ports=1
port 0: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
RX desc=512 - RX free threshold=32
RX threshold registers: pthresh=8 hthresh=8 wthresh=0
RX Offloads=0x0
TX queue: 0
TX desc=512 - TX free threshold=32
TX threshold registers: pthresh=32 hthresh=0 wthresh=0
TX offloads=0x0 - TX RS bit threshold=32
testpmd>
testpmd>
testpmd> stop
Telling cores to stop...
Waiting for lcores to finish...
---------------------- Forward statistics for port 0 ----------------------
RX-packets: 13 RX-dropped: 0 RX-total: 13
TX-packets: 13 TX-dropped: 0 TX-total: 13
----------------------------------------------------------------------------
+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
RX-packets: 13 RX-dropped: 0 RX-total: 13
TX-packets: 13 TX-dropped: 0 TX-total: 13
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Done.
testpmd>
testpmd>
testpmd> quit
Stopping port 0...
Stopping ports...
Done
Shutting down port 0...
Closing ports...
Done
Bye...
sh-4.4#
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:2947