RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1789352 - Unable to run a dpdk workload without privileged=true
Summary: Unable to run a dpdk workload without privileged=true
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: dpdk
Version: 8.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: David Marchand
QA Contact: Jean-Tsung Hsiao
URL:
Whiteboard:
: 1785933 (view as bug list)
Depends On: 1785933
Blocks: 1771572 1791410 1791411
TreeView+ depends on / blocked
 
Reported: 2020-01-09 12:28 UTC by Amnon Ilan
Modified: 2020-12-15 11:04 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1785933
: 1791410 1791411 (view as bug list)
Environment:
Last Closed: 2020-12-15 11:04:34 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 5 David Marchand 2020-01-10 15:28:00 UTC
To check this issue, you need a system with a mellanox nic.

Here are simplified steps that do not require a setup with containers.
Please note that changing those permissions below is invasive, so you might want to do this in a virtual machine.

On a fresh RHEL 8.1 system with the dpdk 19.11 package installed.
[root@wsfd-netdev11 ~]# echo options mlx4_core log_num_mgm_entry_size=-1 >> /etc/modprobe.d/mlx4.conf
[root@wsfd-netdev11 ~]# modprobe -r mlx4_ib mlx4_en mlx4_core
[root@wsfd-netdev11 ~]# modprobe mlx4_ib

[root@wsfd-netdev11 ~]# adduser testuser
[root@wsfd-netdev11 ~]# chown testuser /dev/hugepages

[root@wsfd-netdev11 ~]# chown testuser /dev/hugepages/rtemap* # should be unneeded unless you started testpmd or any dpdk application before

Then:

[root@wsfd-netdev11 ~]# yum remove -y dpdk; yum install -y 'http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/dpdk/19.11/1.el8/x86_64/dpdk-19.11-1.el8.x86_64.rpm'
...

# IMPORTANT every time you install a dpdk package, you must set those capabilities, because the testpmd binary in the rpm does not contain such capabilities
[root@wsfd-netdev11 ~]# setcap cap_net_admin,cap_net_raw,cap_ipc_lock+ep $(which testpmd)

[root@wsfd-netdev11 ~]# sudo -u testuser testpmd -w 0000:02:00.0 -v -- -i
EAL: Detected 32 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: RTE Version: 'DPDK 19.11.0'
net_mlx5: cannot load glue library: /lib64/libmlx5.so.1: version `MLX5_1.10' not found (required by /usr/lib64/dpdk-pmds-glue/librte_pmd_mlx5_glue.so.19.08.0)
net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)
EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL: PCI device 0000:02:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1007 net_mlx4
net_mlx4: Verbs external allocator is not supported
net_mlx4: Verbs external allocator is not supported
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=395456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=395456, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
net_mlx4: 0x7f1effa7e100: failed to obtain UAR mmap offset
Port 0: E4:1D:2D:BF:F4:C0
Configuring Port 1 (socket 0)
net_mlx4: 0x7f1effa821c0: failed to obtain UAR mmap offset
Port 1: E4:1D:2D:BF:F4:C1
Checking link statuses...
Done
testpmd> 


Just for the record, when I test with the 18.11 package, I can see:
[root@wsfd-netdev11 ~]# yum remove -y dpdk; yum install -y 'http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/dpdk/18.11.2/4.el8/x86_64/dpdk-18.11.2-4.el8.x86_64.rpm'
...

[root@wsfd-netdev11 ~]# setcap cap_net_admin,cap_net_raw,cap_ipc_lock+ep $(which testpmd)

[root@wsfd-netdev11 ~]# sudo -u testuser testpmd -w 0000:02:00.0 -v -- -i
EAL: Detected 32 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: RTE Version: 'DPDK 18.11.2'
EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
EAL: Probing VFIO support...
EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
PANIC in main():
Cannot init EAL
5: [testpmd(_start+0x2e) [0x557cb2ba1b5e]]
4: [/lib64/libc.so.6(__libc_start_main+0xf3) [0x7f29ce7e2873]]
3: [testpmd(+0x4ddf1) [0x557cb2ba0df1]]
2: [/lib64/librte_eal.so.9(__rte_panic+0xc1) [0x7f29cf9e81e6]]
1: [/lib64/librte_eal.so.9(rte_dump_stack+0x32) [0x7f29cf9f4e32]]
Aborted

Comment 6 David Marchand 2020-01-10 15:58:40 UTC
Please note too that I did not have a 8.2 system at the time, but you must test on 8.2.

Comment 14 zenghui.shi 2020-02-05 14:13:36 UTC
*** Bug 1785933 has been marked as a duplicate of this bug. ***

Comment 15 Jean-Tsung Hsiao 2020-02-12 20:23:57 UTC
Hi David,
Got segfault!
Please advise.
Thanks!
Jean


Installed:
  dpdk-19.11-1.el8.x86_64                                                                                                               

Complete!
[root@netqe7 ~]# setcap cap_net_admin,cap_net_raw,cap_ipc_lock+ep $(which testpmd)
[root@netqe7 ~]# sudo -u testuser testpmd -w 0000:03:00.0 -v -- -i
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: RTE Version: 'DPDK 19.11.0'
EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL: PCI device 0000:03:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1007 net_mlx4
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=267456, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
Port 0 is now not stopped
Configuring Port 1 (socket 0)
Segmentation fault
[root@netqe7 ~]# rpm -q dpdk
dpdk-19.11-1.el8.x86_64
[root@netqe7 ~]# uname -r
4.18.0-167.el8.x86_64
[root@netqe7 ~]#

[root@netqe7 ~]# cat /etc/modprobe.d/mlx4.conf
# This file is intended for users to select the various module options
# they need for the mlx4 driver.  On upgrade of the rdma package,
# any user made changes to this file are preserved.  Any changes made
# to the libmlx4.conf file in this directory are overwritten on
# pacakge upgrade.
#
# Some sample options and what they would do
# Enable debugging output, device managed flow control, and disable SRIOV
#options mlx4_core debug_level=1 log_num_mgm_entry_size=-1 probe_vf=0 num_vfs=0
#
# Enable debugging output and create SRIOV devices, but don't attach any of
# the child devices to the host, only the parent device
#options mlx4_core debug_level=1 probe_vf=0 num_vfs=7
#
# Enable debugging output, SRIOV, and attach one of the SRIOV child devices
# in addition to the parent device to the host
#options mlx4_core debug_level=1 probe_vf=1 num_vfs=7
#
# Enable per priority flow control for send and receive, setting both priority
# 1 and 2 as no drop priorities
#options mlx4_en pfctx=3 pfcrx=3
options mlx4_core debug_level=1 log_num_mgm_entry_size=-1
options mlx4_core log_num_mgm_entry_size=-1
[root@netqe7 ~]#

Comment 16 David Marchand 2020-02-13 07:52:16 UTC
Can you retrieve the coredump or give me access to this system?
Thanks.

Comment 17 Jean-Tsung Hsiao 2020-02-13 13:57:15 UTC
(In reply to David Marchand from comment #16)
> Can you retrieve the coredump or give me access to this system?
> Thanks.

Sorry! Gave you wrong test bed!

<jhsiao> HI
<jhsiao> Sorry! Wrong testbed
<jhsiao> Should be
<jhsiao> netqe7.knqe.lab.eng.bos.redhat.com
<jhsiao> 10.19.15.0/24 dev eno3 proto kernel scope link src 10.19.15.17 metric 100 
<jhsiao> same passwd

Comment 18 David Marchand 2020-02-13 14:43:04 UTC
This is a different issue, most likely a mlx setup issue.


Starting as root triggers the segfault on this system:

[root@netqe7 ~]# testpmd -w 0000:03:00.0 -v -- -i
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: RTE Version: 'DPDK 19.11.0'
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL: PCI device 0000:03:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1007 net_mlx4
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=267456, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
Port 0 is now not stopped
Port 1 is now not stopped
Please stop the ports first
Done
Segmentation fault (core dumped)

[root@netqe7 ~]# gdb $(which testpmd) # -w 0000:03:00.0 -v -- -i
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-8.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/testpmd...Reading symbols from /usr/lib/debug/usr/bin/testpmd-19.11-1.el8.x86_64.debug...done.
done.
(gdb) run -w 0000:03:00.0 -v --log-level *:debug -- -i
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/bin/testpmd -w 0000:03:00.0 -v --log-level *:debug -- -i
warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: Loadable section ".note.gnu.property" outside of ELF segments
EAL: Detected lcore 0 as core 2 on socket 0
EAL: Detected lcore 1 as core 0 on socket 1
EAL: Detected lcore 2 as core 3 on socket 0
EAL: Detected lcore 3 as core 1 on socket 1
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 4 on socket 1
EAL: Detected lcore 6 as core 5 on socket 0
EAL: Detected lcore 7 as core 5 on socket 1
EAL: Detected lcore 8 as core 2 on socket 0
EAL: Detected lcore 9 as core 0 on socket 1
EAL: Detected lcore 10 as core 3 on socket 0
EAL: Detected lcore 11 as core 1 on socket 1
EAL: Detected lcore 12 as core 4 on socket 0
EAL: Detected lcore 13 as core 4 on socket 1
EAL: Detected lcore 14 as core 5 on socket 0
EAL: Detected lcore 15 as core 5 on socket 1
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: RTE Version: 'DPDK 19.11.0'
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_bnxt.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_e1000.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_enic.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_failsafe.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_i40e.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ixgbe.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx4.so.20.0
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx5.so.20.0
warning: Loadable section ".note.gnu.property" outside of ELF segments
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_netvsc.so.20.0
EAL: Registered [vmbus] bus.
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_nfp.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_qede.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ring.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_tap.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vdev_netvsc.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vhost.so.20.0
EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_virtio.so.20.0
EAL: Ask a virtual area of 0x5000 bytes
EAL: Virtual area found at 0x100000000 (size = 0x5000)
[New Thread 0x7ffff01c0700 (LWP 15655)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
[New Thread 0x7fffef9bf700 (LWP 15656)]
EAL: Module /sys/module/vfio_pci not found! error 2 (No such file or directory)
EAL: VFIO PCI modules not loaded
EAL: Bus pci wants IOVA as 'DC'
EAL: Buses did not request a specific IOVA mode.
EAL: IOMMU is available, selecting IOVA as VA mode.
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL: Module /sys/module/vfio not found! error 2 (No such file or directory)
EAL: VFIO modules not loaded, skipping VFIO support...
EAL: Ask a virtual area of 0x2e000 bytes
EAL: Virtual area found at 0x100005000 (size = 0x2e000)
EAL: Setting up physically contiguous memory...
EAL: Setting maximum number of open files to 4096
EAL: Detected memory type: socket_id:0 hugepage_sz:1073741824
EAL: Detected memory type: socket_id:1 hugepage_sz:1073741824
EAL: Creating 4 segment lists: n_segs:32 socket_id:0 hugepage_sz:1073741824
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x100033000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 0
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x140000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x940000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 0
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x980000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x1180000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 0
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x11c0000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x19c0000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 0
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x1a00000000 (size = 0x800000000)
EAL: Creating 4 segment lists: n_segs:32 socket_id:1 hugepage_sz:1073741824
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x2200000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 1
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x2240000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x2a40000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 1
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x2a80000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x3280000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 1
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x32c0000000 (size = 0x800000000)
EAL: Ask a virtual area of 0x1000 bytes
EAL: Virtual area found at 0x3ac0000000 (size = 0x1000)
EAL: Memseg list allocated: 0x100000kB at socket 1
EAL: Ask a virtual area of 0x800000000 bytes
EAL: Virtual area found at 0x3b00000000 (size = 0x800000000)
EAL: TSC frequency is ~3500000 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fd7900;cpuset=[0])
[New Thread 0x7fffef1be700 (LWP 15657)]
EAL: lcore 1 is ready (tid=7fffef1be700;cpuset=[1])
[New Thread 0x7fffee9bd700 (LWP 15658)]
EAL: lcore 2 is ready (tid=7fffee9bd700;cpuset=[2])
[New Thread 0x7fffee1bc700 (LWP 15659)]
EAL: lcore 3 is ready (tid=7fffee1bc700;cpuset=[3])
[New Thread 0x7fffed9bb700 (LWP 15660)]
EAL: lcore 4 is ready (tid=7fffed9bb700;cpuset=[4])
[New Thread 0x7fffed1ba700 (LWP 15661)]
EAL: lcore 5 is ready (tid=7fffed1ba700;cpuset=[5])
[New Thread 0x7fffec9b9700 (LWP 15662)]
EAL: lcore 6 is ready (tid=7fffec9b9700;cpuset=[6])
[New Thread 0x7fffd7fff700 (LWP 15663)]
[New Thread 0x7fffd77fe700 (LWP 15664)]
EAL: lcore 7 is ready (tid=7fffd7fff700;cpuset=[7])
[New Thread 0x7fffd6ffd700 (LWP 15665)]
[New Thread 0x7fffd67fc700 (LWP 15666)]
EAL: lcore 9 is ready (tid=7fffd6ffd700;cpuset=[9])
EAL: lcore 10 is ready (tid=7fffd67fc700;cpuset=[10])
[New Thread 0x7fffd5ffb700 (LWP 15667)]
EAL: lcore 11 is ready (tid=7fffd5ffb700;cpuset=[11])
EAL: lcore 8 is ready (tid=7fffd77fe700;cpuset=[8])
[New Thread 0x7fffd57fa700 (LWP 15668)]
[New Thread 0x7fffd4ff9700 (LWP 15669)]
EAL: lcore 13 is ready (tid=7fffd4ff9700;cpuset=[13])
[New Thread 0x7fffbffff700 (LWP 15670)]
[New Thread 0x7fffbf7fe700 (LWP 15671)]
EAL: lcore 15 is ready (tid=7fffbf7fe700;cpuset=[15])
EAL: lcore 14 is ready (tid=7fffbffff700;cpuset=[14])
EAL: lcore 12 is ready (tid=7fffd57fa700;cpuset=[12])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: request: mp_malloc_sync
EAL: Heap on socket 0 was expanded by 1024MB
EAL: PCI device 0000:03:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1007 net_mlx4
EAL: Mem event callback 'MLX4_MEM_EVENT_CB:(nil)' registered
EAL: Module /sys/module/vfio not found! error 2 (No such file or directory)
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=267456, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 1
EAL: Restoring previous memory policy: 0
EAL: Calling mem event callback 'MLX4_MEM_EVENT_CB:(nil)'
EAL: request: mp_malloc_sync
EAL: Heap on socket 1 was expanded by 1024MB
Port 0 is now not stopped
Port 1 is now not stopped
Please stop the ports first
Done

Thread 1 "testpmd" received signal SIGSEGV, Segmentation fault.
0x00007ffff27513cb in mlx4_flow_internal (priv=0x17ffe3180, error=0x7fffffffdd40) at /usr/src/debug/dpdk-19.11-1.el8.x86_64/drivers/net/mlx4/mlx4_flow.c:1502
1502		while (flow && flow->internal) {
(gdb) l
1497			flow->select = 1;
1498		}
1499	error:
1500		/* Clear selection and clean up stale internal flow rules. */
1501		flow = LIST_FIRST(&priv->flows);
1502		while (flow && flow->internal) {
1503			struct rte_flow *next = LIST_NEXT(flow, next);
1504	
1505			if (!flow->select)
1506				claim_zero(mlx4_flow_destroy(ETH_DEV(priv), flow,
(gdb) bt full
#0  0x00007ffff27513cb in mlx4_flow_internal (priv=0x17ffe3180, error=0x7fffffffdd40) at /usr/src/debug/dpdk-19.11-1.el8.x86_64/drivers/net/mlx4/mlx4_flow.c:1502
        attr = {group = 0, priority = 4095, ingress = 1, egress = 0, transfer = 0, reserved = 0}
        eth_spec = {dst = {addr_bytes = "\000\000i\t\201", <incomplete sequence \363>}, src = {addr_bytes = "\377\177\000\000\300\006"}, type = 62389}
        eth_mask = {dst = {addr_bytes = "\377\377\377\377\377\377"}, src = {addr_bytes = "\000\000\000\000\000"}, type = 0}
        eth_allmulti = {dst = {addr_bytes = "\001\000\000\000\000"}, src = {addr_bytes = "\000\000\000\000\000"}, type = 0}
        vlan_spec = {tci = 32767, inner_type = 0}
        vlan_mask = {tci = 65295, inner_type = 0}
        pattern = {{type = 4294967295, spec = 0x0, last = 0x0, mask = 0x0}, {type = RTE_FLOW_ITEM_TYPE_ETH, spec = 0x7fffffffdbf6, last = 0x0, mask = 0x7fffffffdc04}, {type = RTE_FLOW_ITEM_TYPE_END, 
            spec = 0x0, last = 0x0, mask = 0x0}, {type = RTE_FLOW_ITEM_TYPE_END, spec = 0x0, last = 0x0, mask = 0x0}}
        queues = <optimized out>
        queue = 0x7fffffffdb30
        action_rss = {func = RTE_ETH_HASH_FUNCTION_DEFAULT, level = 0, types = 0, key_len = 40, queue_num = 0, 
          key = 0x7ffff29630a0 <mlx4_rss_hash_key_default> ",Ɓ\321[\333\364\367\374\242\203\031\333\032>\224k\236\070\331,\234\003ѭ\231D\247\331V=Y\006<%\363\374\037\334*", queue = 0x7fffffffdb30}
        actions = {{type = RTE_FLOW_ACTION_TYPE_RSS, conf = 0x7fffffffdbc0}, {type = RTE_FLOW_ACTION_TYPE_END, conf = 0x0}}
        rule_mac = 0x7fffffffdbf6
        rule_vlan = 0x0
        vlan = <optimized out>
        flow = 0x7f5be01a9500
        i = <optimized out>
        err = 0
#1  0x00007ffff27516a5 in mlx4_flow_sync (priv=0x17ffe3180, error=error@entry=0x7fffffffdd40) at /usr/src/debug/dpdk-19.11-1.el8.x86_64/drivers/net/mlx4/mlx4_flow.c:1549
        flow = <optimized out>
        ret = <optimized out>
#2  0x00007ffff274e0e3 in mlx4_rxmode_toggle (toggle=<optimized out>, dev=<optimized out>) at /usr/src/debug/dpdk-19.11-1.el8.x86_64/drivers/net/mlx4/mlx4_ethdev.c:371
        priv = <optimized out>
        mode = 0x7ffff275d724 "promiscuous"
        error = {type = 1434613920, cause = 0x0, message = 0x1 <error: Cannot access memory at address 0x1>}
        ret = <optimized out>
#3  0x00007ffff54a749b in rte_eth_promiscuous_enable (port_id=port_id@entry=0) at /usr/src/debug/dpdk-19.11-1.el8.x86_64/lib/librte_ethdev/rte_ethdev.c:2247
        dev = 0x7ffff56cb100 <rte_eth_devices>
        diag = 0
#4  0x00005555555a74fa in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/dpdk-19.11-1.el8.x86_64/app/test-pmd/testpmd.c:3492
        diag = <optimized out>
        port_id = 0
        count = <optimized out>
        ret = <optimized out>



I suppose this has to do with hardware/kernel module configuration.

I rebooted the system and the problem is gone.
testpmd starts fine as root and non root.


[root@netqe7 ~]# chown testuser /dev/hugepages/rtemap*
[root@netqe7 ~]# chown testuser /dev/hugepages
[root@netqe7 ~]# sudo -u testuser testpmd -w 0000:03:00.0 -v -- -i
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: RTE Version: 'DPDK 19.11.0'
EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL: PCI device 0000:03:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1007 net_mlx4
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=267456, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
net_mlx4: 0x7fcfeb50b100: cannot attach flow rules (code 95, "Operation not supported"), flow error type 2, cause 0x1598cea40, message: flow rule rejected by device
Fail to start port 0
Configuring Port 1 (socket 0)
net_mlx4: 0x7fcfeb50f1c0: cannot attach flow rules (code 95, "Operation not supported"), flow error type 2, cause 0x1598cc080, message: flow rule rejected by device
Fail to start port 1
Please stop the ports first
Done


I downgraded to dpdk 18.11.2 and the same error happens.

Can you double check this setup with 18.11 on rhel 8.2 please?

Comment 19 Jean-Tsung Hsiao 2020-02-13 15:16:53 UTC
Hi David,

By 18.11 you mean 18.11.5 or other 18.11.X where X is NOT = 2 or 5 ?
Thanks!
Jean

Comment 20 Jean-Tsung Hsiao 2020-02-13 17:58:28 UTC
Hi David,

Don't know what's wrong with netqe7. I moved to netqe8 --- mirror of netqe7.

It's now working --- dpdk-19.11-1 under RHEL-8.2.0.

Please check the log below.
Thanks!
Jean

[root@netqe8 ~]# sudo -u testuser testpmd -w 0000:03:00.0 -v -- -i
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: RTE Version: 'DPDK 19.11.0'
EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL: PCI device 0000:03:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1007 net_mlx4
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=267456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=267456, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: E4:1D:2D:79:B3:11
Configuring Port 1 (socket 0)
Port 1: E4:1D:2D:79:B3:12
Checking link statuses...
Done
testpmd> quit

Stopping port 0...
Stopping ports...
Done

Stopping port 1...
Stopping ports...
Done

Shutting down port 0...
Closing ports...
Done

Shutting down port 1...
Closing ports...
Done

Bye...
[root@netqe8 ~]# uname -r
4.18.0-167.el8.x86_64
[root@netqe8 ~]# rpm -q dpdk
dpdk-19.11-1.el8.x86_64
[root@netqe8 ~]# 

So, I believe we have verified the fix with dpdk-19.11-1.

Comment 21 David Marchand 2020-02-13 19:46:08 UTC
Ok for me, thanks.

Comment 22 Jean-Tsung Hsiao 2020-02-13 20:14:26 UTC
Hi David,

Just found different mlx4 firmware between netqe7 and netqe8.
Please check below.
Not sure if that makes the different behaviour.
Thanks!
Jean

[root@netqe7 ~]# ethtool -i enp3s0
driver: mlx4_en
version: 4.0-0
firmware-version: 2.33.5100
expansion-rom-version: 
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@netqe7 ~]#

[root@netqe8 ~]# ethtool -i enp3s0
driver: mlx4_en
version: 4.0-0
firmware-version: 2.42.5000
expansion-rom-version: 
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@netqe8 ~]#

Comment 23 Jean-Tsung Hsiao 2020-02-13 21:26:12 UTC
After configure /etc/rdma/mlx4.conf with "0000:03:00.0 eth eth" the mlx4_en on netqe7 is now working

Comment 24 David Marchand 2020-02-14 09:37:28 UTC
I knew about this kind of configuration, but did not know we had to do this on RHEL.
Thanks for the tip.


Note You need to log in before you can comment on or make changes to this bug.