+++ This bug was initially created as a clone of Bug #1850163 +++ Description of problem: unable to run testpmd inside a container with latest dpdk version 18.11 Interface details: ethtool -i ens1f0 driver: i40e version: 2.8.20-k firmware-version: 6.00 0x800036cb 1.1747.0 expansion-rom-version: bus-info: 0000:12:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes 12:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02) 12:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02) lspci -v -nn -mm -s 0000:12:02.0 Slot: 12:02.0 Class: Ethernet controller [0200] Vendor: Intel Corporation [8086] Device: Ethernet Virtual Function 700 Series [154c] SVendor: Hewlett Packard Enterprise [1590] SDevice: Device [0000] Rev: 02 NUMANode: 0 running the testpmd application strace -ff -o /tmp/testpmd.strace testpmd -l ${CPU} -w ${PCIDEVICE_OPENSHIFT_IO_DPDKNIC} --iova-mode=va --log-level="*:debug" -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 1 on socket 0 EAL: Detected lcore 2 as core 2 on socket 0 EAL: Detected lcore 3 as core 3 on socket 0 EAL: Detected lcore 4 as core 4 on socket 0 EAL: Detected lcore 5 as core 8 on socket 0 EAL: Detected lcore 6 as core 9 on socket 0 EAL: Detected lcore 7 as core 10 on socket 0 EAL: Detected lcore 8 as core 11 on socket 0 EAL: Detected lcore 9 as core 12 on socket 0 EAL: Detected lcore 10 as core 16 on socket 0 EAL: Detected lcore 11 as core 17 on socket 0 EAL: Detected lcore 12 as core 18 on socket 0 EAL: Detected lcore 13 as core 19 on socket 0 EAL: Detected lcore 14 as core 20 on socket 0 EAL: Detected lcore 15 as core 24 on socket 0 EAL: Detected lcore 16 as core 25 on socket 0 EAL: Detected lcore 17 as core 26 on socket 0 EAL: Detected lcore 18 as core 27 on socket 0 EAL: Detected lcore 19 as core 28 on socket 0 EAL: Detected lcore 20 as core 0 on socket 1 EAL: Detected lcore 21 as core 1 on socket 1 EAL: Detected lcore 22 as core 2 on socket 1 EAL: Detected lcore 23 as core 3 on socket 1 EAL: Detected lcore 24 as core 4 on socket 1 EAL: Detected lcore 25 as core 8 on socket 1 EAL: Detected lcore 26 as core 9 on socket 1 EAL: Detected lcore 27 as core 10 on socket 1 EAL: Detected lcore 28 as core 11 on socket 1 EAL: Detected lcore 29 as core 12 on socket 1 EAL: Detected lcore 30 as core 16 on socket 1 EAL: Detected lcore 31 as core 17 on socket 1 EAL: Detected lcore 32 as core 18 on socket 1 EAL: Detected lcore 33 as core 19 on socket 1 EAL: Detected lcore 34 as core 20 on socket 1 EAL: Detected lcore 35 as core 24 on socket 1 EAL: Detected lcore 36 as core 25 on socket 1 EAL: Detected lcore 37 as core 26 on socket 1 EAL: Detected lcore 38 as core 27 on socket 1 EAL: Detected lcore 39 as core 28 on socket 1 EAL: Detected lcore 40 as core 0 on socket 0 EAL: Detected lcore 41 as core 1 on socket 0 EAL: Detected lcore 42 as core 2 on socket 0 EAL: Detected lcore 43 as core 3 on socket 0 EAL: Detected lcore 44 as core 4 on socket 0 EAL: Detected lcore 45 as core 8 on socket 0 EAL: Detected lcore 46 as core 9 on socket 0 EAL: Detected lcore 47 as core 10 on socket 0 EAL: Detected lcore 48 as core 11 on socket 0 EAL: Detected lcore 49 as core 12 on socket 0 EAL: Detected lcore 50 as core 16 on socket 0 EAL: Detected lcore 51 as core 17 on socket 0 EAL: Detected lcore 52 as core 18 on socket 0 EAL: Detected lcore 53 as core 19 on socket 0 EAL: Detected lcore 54 as core 20 on socket 0 EAL: Detected lcore 55 as core 24 on socket 0 EAL: Detected lcore 56 as core 25 on socket 0 EAL: Detected lcore 57 as core 26 on socket 0 EAL: Detected lcore 58 as core 27 on socket 0 EAL: Detected lcore 59 as core 28 on socket 0 EAL: Detected lcore 60 as core 0 on socket 1 EAL: Detected lcore 61 as core 1 on socket 1 EAL: Detected lcore 62 as core 2 on socket 1 EAL: Detected lcore 63 as core 3 on socket 1 EAL: Detected lcore 64 as core 4 on socket 1 EAL: Detected lcore 65 as core 8 on socket 1 EAL: Detected lcore 66 as core 9 on socket 1 EAL: Detected lcore 67 as core 10 on socket 1 EAL: Detected lcore 68 as core 11 on socket 1 EAL: Detected lcore 69 as core 12 on socket 1 EAL: Detected lcore 70 as core 16 on socket 1 EAL: Detected lcore 71 as core 17 on socket 1 EAL: Detected lcore 72 as core 18 on socket 1 EAL: Detected lcore 73 as core 19 on socket 1 EAL: Detected lcore 74 as core 20 on socket 1 EAL: Detected lcore 75 as core 24 on socket 1 EAL: Detected lcore 76 as core 25 on socket 1 EAL: Detected lcore 77 as core 26 on socket 1 EAL: Detected lcore 78 as core 27 on socket 1 EAL: Detected lcore 79 as core 28 on socket 1 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 80 lcore(s) EAL: Detected 2 NUMA nodes EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_failsafe.so.1 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx4.so.1 EAL: Mem event callback 'MLX4_MEM_EVENT_CB:(nil)' registered EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vhost.so.2 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vdev_netvsc.so.1 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_virtio.so.1 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_tap.so.1 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_qede.so.1 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_enic.so.1 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_i40e.so.2 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_netvsc.so.1 EAL: Registered [vmbus] bus. EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx5.so.1 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_e1000.so.1 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_nfp.so.1 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_bnxt.so.2 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ring.so.2 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ixgbe.so.2 EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: VFIO PCI modules not loaded EAL: No free hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized EAL: Ask a virtual area of 0x2e000 bytes EAL: Virtual area found at 0x100000000 (size = 0x2e000) EAL: Setting up physically contiguous memory... EAL: Setting maximum number of open files to 1048576 EAL: Detected memory type: socket_id:0 hugepage_sz:1073741824 EAL: Detected memory type: socket_id:1 hugepage_sz:1073741824 EAL: Creating 4 segment lists: n_segs:32 socket_id:0 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x10002e000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x140000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x940000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x980000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x1180000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x11c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x19c0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x1a00000000 (size = 0x800000000) EAL: Creating 4 segment lists: n_segs:32 socket_id:1 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2200000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2240000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2a40000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2a80000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3280000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x32c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3ac0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x3b00000000 (size = 0x800000000) EAL: TSC frequency is ~2095071 KHz EAL: Master lcore 10 is ready (tid=7fce1d7168c0;cpuset=[10]) EAL: lcore 11 is ready (tid=7fce1489c700;cpuset=[11]) EAL: lcore 50 is ready (tid=7fce1409b700;cpuset=[50]) EAL: lcore 51 is ready (tid=7fce1389a700;cpuset=[51]) EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: Calling mem event callback 'MLX4_MEM_EVENT_CB:(nil)' EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 1024MB EAL: PCI device 0000:12:02.0 on NUMA socket 0 EAL: probe driver: 8086:154c net_i40e_vf EAL: using IOMMU type 1 (Type 1) EAL: Mem event callback 'vfio_mem_event_clb:(nil)' registered EAL: Installed memory event callback for VFIO EAL: VFIO reports MSI-X BAR as mappable EAL: PCI memory mapped at 0x4300000000 EAL: PCI memory mapped at 0x4300010000 i40evf_dev_init(): >> i40e_set_mac_type(): i40e_set_mac_type i40e_set_mac_type(): i40e_set_mac_type found mac: 2, returns: 0 Bus error (core dumped) versions: yum list installed | grep dpdk dpdk.x86_64 18.11.2-3.el8 @rhel-8-for-x86_64-appstream-rpms dpdk-devel.x86_64 18.11.2-3.el8 @rhel-8-for-x86_64-appstream-rpms dpdk-tools.x86_64 18.11.2-3.el8 @rhel-8-for-x86_64-appstream-rpms --- Additional comment from Maxime Coquelin on 2020-06-24 17:06:21 CEST --- Backtrace without the debuginfo installed: Thread 1 "testpmd" received signal SIGBUS, Bus error. 0x00007ffff78f6a27 in i40evf_check_vf_reset_done.isra () from /lib64/librte_pmd_i40e.so.2 (gdb) bt #0 0x00007ffff78f6a27 in i40evf_check_vf_reset_done.isra () from /lib64/librte_pmd_i40e.so.2 #1 0x00007ffff78f809e in i40evf_dev_init () from /lib64/librte_pmd_i40e.so.2 #2 0x00007ffff79292b5 in eth_i40evf_pci_probe () from /lib64/librte_pmd_i40e.so.2 #3 0x00007ffff44195f2 in pci_probe_all_drivers.cold () from /lib64/librte_bus_pci.so.2 #4 0x00007ffff441cfd2 in rte_pci_probe () from /lib64/librte_bus_pci.so.2 #5 0x00007ffff4a4e773 in rte_bus_probe () from /lib64/librte_eal.so.9 #6 0x00007ffff4a38dd4 in rte_eal_init.cold () from /lib64/librte_eal.so.9 #7 0x00005555555a2103 in main () --- Additional comment from Maxime Coquelin on 2020-06-24 17:07:55 CEST --- (gdb) disas Dump of assembler code for function i40evf_check_vf_reset_done.isra.6: 0x00007ffff78f6a18 <+0>: push %rbp 0x00007ffff78f6a19 <+1>: mov $0x14,%ebp 0x00007ffff78f6a1e <+6>: push %rbx 0x00007ffff78f6a1f <+7>: push %rcx 0x00007ffff78f6a20 <+8>: mov 0x60(%rdi),%rbx 0x00007ffff78f6a24 <+12>: mov (%rbx),%rax => 0x00007ffff78f6a27 <+15>: mov 0x8800(%rax),%eax 0x00007ffff78f6a2d <+21>: and $0x3,%eax 0x00007ffff78f6a30 <+24>: dec %eax 0x00007ffff78f6a32 <+26>: cmp $0x1,%eax 0x00007ffff78f6a35 <+29>: jbe 0x7ffff78f6a4e <i40evf_check_vf_reset_done.isra.6+54> 0x00007ffff78f6a37 <+31>: mov 0x277542(%rip),%rax # 0x7ffff7b6df80 0x00007ffff78f6a3e <+38>: mov $0xc350,%edi 0x00007ffff78f6a43 <+43>: callq *(%rax) 0x00007ffff78f6a45 <+45>: dec %ebp 0x00007ffff78f6a47 <+47>: jne 0x7ffff78f6a24 <i40evf_check_vf_reset_done.isra.6+12> 0x00007ffff78f6a49 <+49>: or $0xffffffff,%eax 0x00007ffff78f6a4c <+52>: jmp 0x7ffff78f6a5f <i40evf_check_vf_reset_done.isra.6+71> 0x00007ffff78f6a4e <+54>: andw $0xfffd,0xa58(%rbx) 0x00007ffff78f6a56 <+62>: xor %eax,%eax 0x00007ffff78f6a58 <+64>: movb $0x0,0xa4c(%rbx) 0x00007ffff78f6a5f <+71>: pop %rdx 0x00007ffff78f6a60 <+72>: pop %rbx 0x00007ffff78f6a61 <+73>: pop %rbp 0x00007ffff78f6a62 <+74>: retq End of assembler dump. (gdb) info registers rax 0x4300000000 287762808832 rbx 0x17ffb1440 6442128448 rcx 0x7fffffffe430 140737488348208 rdx 0x0 0 rsi 0x25 37 rdi 0x17ffb3600 6442137088 rbp 0x14 0x14 rsp 0x7fffffffe510 0x7fffffffe510 r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x0 0 r12 0x17ffb1440 6442128448 r13 0x7ffff7b6fd44 140737349352772 r14 0x7ffff441e968 140737291348328 r15 0x7ffff4cf6430 140737300620336 rip 0x7ffff78f6a27 0x7ffff78f6a27 <i40evf_check_vf_reset_done.isra+15> eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 k0 0x0 0 k1 0x0 0 k2 0x0 0 k3 0x0 0 k4 0x0 0 k5 0x0 0 k6 0x0 0 k7 0x0 0 --- Additional comment from Maxime Coquelin on 2020-06-24 18:58:35 CEST --- It crashes while trying to access the PCI memory at 0x4300000000. With GDB we can confirm this memory area, and also the other one at 0x4300010000 aren't accessible. We also tried with adding CAP_SYS_ADMIN: sh-4.4# capsh --print Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_admin,cap_sys_resource+eip Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_admin,cap_sys_resource Ambient set = Securebits: 00/0x0/1'b0 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) secure-no-ambient-raise: no (unlocked) uid=0(root) gid=0(root) groups= but it still fails. --- Additional comment from Maxime Coquelin on 2020-06-24 19:29:32 CEST --- We think we have found the reason of the failure. The kernel has been updated recently to kernel-4.18.0-193.9.1.el8_2, which contains a fix for CVE-2020-12888. With this CVE fix, DPDK application using VFIO will fail. A fix has been posted upstream 3 days ago, but is not merged yet: https://patchwork.dpdk.org/patch/71962/ I will prepare you a scratch build with the DPDK patch backported. --- Additional comment from Maxime Coquelin on 2020-06-24 20:10:56 CEST --- Scratch build available here for testing: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=29669855 --- Additional comment from Sebastian Scheinkman on 2020-06-24 20:31:15 CEST --- After installing the scratch build testpmd is working again dpdk-18.11.2-4.el8_1.bz1850163.x86_64 dpdk-devel-18.11.2-4.el8_1.bz1850163.x86_64 dpdk-tools-18.11.2-4.el8_1.bz1850163.x86_64 Complete! sh-4.4# export CPU=$(cat /sys/fs/cgroup/cpuset/cpuset.cpus) sh-4.4# echo ${CPU} 8-9,48-49 sh-4.4# echo ${PCIDEVICE_OPENSHIFT_IO_DPDKNIC} 0000:12:02.4 sh-4.4# sh-4.4# testpmd -l ${CPU} -w ${PCIDEVICE_OPENSHIFT_IO_DPDKNIC} --iova-mode=va -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall EAL: Detected 80 lcore(s) EAL: Detected 2 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: No free hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device 0000:12:02.4 on NUMA socket 0 EAL: probe driver: 8086:154c net_i40e_vf EAL: using IOMMU type 1 (Type 1) Interactive-mode selected Set mac packet forwarding mode testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Configuring Port 0 (socket 0) Port 0: 3A:C9:AA:4B:E8:8C Checking link statuses... Done testpmd> testpmd> testpmd> start mac packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP al$ocation mode: native Logical Core 9 (socket 0) forwards packets on 1 streams: RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 mac packet forwarding packets/burst=32 nb forwarding cores=2 - nb forwarding ports=1 port 0: RX queue number: 1 Tx queue number: 1 Rx offloads=0x0 Tx offloads=0x0 RX queue: 0 RX desc=512 - RX free threshold=32 RX threshold registers: pthresh=8 hthresh=8 wthresh=0 RX Offloads=0x0 TX queue: 0 TX desc=512 - TX free threshold=32 TX threshold registers: pthresh=32 hthresh=0 wthresh=0 TX offloads=0x0 - TX RS bit threshold=32 testpmd> testpmd> testpmd> stop Telling cores to stop... Waiting for lcores to finish... ---------------------- Forward statistics for port 0 ---------------------- RX-packets: 13 RX-dropped: 0 RX-total: 13 TX-packets: 13 TX-dropped: 0 TX-total: 13 ---------------------------------------------------------------------------- +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++ RX-packets: 13 RX-dropped: 0 RX-total: 13 TX-packets: 13 TX-dropped: 0 TX-total: 13 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Done. testpmd> testpmd> testpmd> quit Stopping port 0... Stopping ports... Done Shutting down port 0... Closing ports... Done Bye... sh-4.4#
* Thu Jun 25 2020 Timothy Redaelli <tredaelli> - 2.13.0-39 - bus/pci: fix VF memory access (#1851169) [2b22bcd9ad02d0180ad5c46a2cccf34a3afba600]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2948