Bug 1785933
| Summary: | Unable to run a dpdk workload without privileged=true | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Sebastian Scheinkman <sscheink> | |
| Component: | Networking | Assignee: | zenghui.shi <zshi> | |
| Networking sub component: | SR-IOV | QA Contact: | zhaozhanqi <zzhao> | |
| Status: | CLOSED DUPLICATE | Docs Contact: | ||
| Severity: | unspecified | |||
| Priority: | unspecified | CC: | ailan, augol, bbennett, dmarchan, eparis, nhorman, zshi | |
| Version: | 4.3.0 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.4.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1789352 (view as bug list) | Environment: | ||
| Last Closed: | 2020-02-05 14:13:36 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1771572, 1789352, 1791410, 1791411 | |||
I presume you are using openvswitch 2.9.0 here (which builds with dpdk 17.11). If thats the case It looks like upstream commit c2361bab70c56f64e50f07946b1b20bf688d782a may need to be backported, though I'm not sure it can be without additional backports. It may be openvswitch needs to update to a later copy of dpdk (19.11 if possible) Hi Neil, We are testing Mellanox VF in bifcurated driver mode, the VF is directly attached to container without going through openvswitch. To add more details on the test setup: 1) This is using mellanox VF in bifcurated driver mode inside container. VFs are provisioned on the host and attached to pod container directly via SR-IOV CNI (which moves VF interface to container namespace). 2) DPDK pkgs are installed inside container image, we have tried downstream dpdk pkgs with version 18.11.3 and 19.11, both tests showed the same failure. 3) Mellanox libibverbs pkg is installed inside container image. 4) vfio-pci module is loaded on host and intel_iommu=on iommu=pt are enabled in kernel cmdline. 5) Container is created with capability 'IPC_LOCK', NO privileged=true. What works: Run testpmd inside container with virtual memory address mode: # testpmd -l 22,24,26 -w 0000:19:01.0 --iova-mode=va -- -i What fails: # testpmd -l 22,24,26 -w 0000:19:01.0 (--iova-mode=pa) -- -i Interesting findings: In a replicate environment, if an intel VF is configured to bind vfio-pci driver with below cmd, then testpmd in above container can run successfully with default iova-mode=pa mode. # sh-4.4# echo 0000:87:0a.0 > /sys/bus/pci/drivers/iavf/unbind # sh-4.4# echo vfio-pci > /sys/bus/pci/devices/0000:87:0a.0/driver_override # sh-4.4# echo 0000:87:0a.0 > /sys/bus/pci/drivers/vfio-pci/bind <=== I have a mellanox pod running. After executing this step, re-run testpmd cmd worked. Note: 0000:87:0a.0 is a Intel VF that is on the same host as Mellanox card under tests. 19.11 should not have had the same failure, in fact it could not have, given that the first error message below: EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function. doesnt exist in that version. If you are using 19.11, you should post the error log from that version of the library please Dpdk pkg version: # rpm -qa | grep dpdk dpdk-tools-19.11-1.el8.x86_64 dpdk-19.11-1.el8.x86_64 Testpmd log with iova-mode=pa: # testpmd -l 54,55,56 -w 0000:af:01.0 --iova-mode=pa -- -i EAL: Detected 72 lcore(s) EAL: Detected 2 NUMA nodes net_mlx5: cannot load glue library: /lib64/libmlx5.so.1: version `MLX5_1.10' not found (required by /usr/lib64/dpdk-pmds-glue/librte_pmd_mlx5_glue.so.19.08.0) net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5) EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: FATAL: Cannot use IOVA as 'PA' since physical addresses are not available EAL: Cannot use IOVA as 'PA' since physical addresses are not available EAL: Error - exiting with code: 1 Cause: Cannot init EAL: Invalid argument Testpmd log without specifying iova-mode: # testpmd -l 54,55,56 -w 0000:af:01.0 -- -i EAL: Detected 72 lcore(s) EAL: Detected 2 NUMA nodes net_mlx5: cannot load glue library: /lib64/libmlx5.so.1: version `MLX5_1.10' not found (required by /usr/lib64/dpdk-pmds-glue/librte_pmd_mlx5_glue.so.19.08.0) net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5) EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: Probing VFIO support... EAL: cannot open VFIO container, error 2 (No such file or directory) EAL: VFIO support could not be initialized testpmd: No probed ethernet devices Interactive-mode selected testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=163456, size=2176, socket=1 testpmd: preferred mempool ops selected: ring_mp_mc Done testpmd> quit Bye... thats a very different error, one that requires the independent installation of the mlx5 rdma libraries. Please install the libibverbs package and attempt to run the command again. (In reply to Sebastian Scheinkman from comment #0) > testpmd -l <cpu-on-the-same-numa> -w <vf-pci-address> -- -i > > Actual results: > testpmd -l 22,24,26 -w 0000:19:01.2 -- -i > EAL: Detected 80 lcore(s) > EAL: Detected 2 NUMA nodes > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > EAL: Probing VFIO support... > EAL: Cannot obtain physical addresses: No such file or directory. Only vfio > will function. > error allocating rte services array > EAL: FATAL: rte_service_init() failed > EAL: rte_service_init() failed > PANIC in main(): > Cannot init EAL > 5: [testpmd() [0x42e657]] > 4: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f044e1363d5]] > 3: [testpmd(main+0xbd7) [0x42e5b7]] > 2: [/lib64/librte_eal.so.9(__rte_panic+0xbd) [0x7f044f3396bd]] > 1: [/lib64/librte_eal.so.9(rte_dump_stack+0x2d) [0x7f044f34503d]] > Aborted (core dumped) I would suspect a permission issue on the hugepages. Could you try with debug logs? testpmd -l 22,24,26 -w 0000:19:01.2 --log-level=*:debug -- -i I expect something like: EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: get_seg_fd(): open failed: Permission denied EAL: Couldn't get fd on hugepage file EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 error allocating rte services array EAL: FATAL: rte_service_init() failed EAL: rte_service_init() failed PANIC in main(): Cannot init EAL (In reply to Neil Horman from comment #1) > I presume you are using openvswitch 2.9.0 here (which builds with dpdk > 17.11). If thats the case It looks like upstream commit > c2361bab70c56f64e50f07946b1b20bf688d782a may need to be backported, though > I'm not sure it can be without additional backports. It may be openvswitch > needs to update to a later copy of dpdk (19.11 if possible) 17.11 should not be affected. 18.11 would be (I did the backports in ovs 2.11 https://bugzilla.redhat.com/show_bug.cgi?id=1711739) but first, let's figure out what is wrong with the rte_service_init failure. Ok, thought again about the trace, and the error on permission should have popped. I reproduced Sebastian issue. The debug logs should look like: EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: alloc_seg(): can't get IOVA addr <========= EAL: Ask a virtual area of 0x40000000 bytes EAL: Virtual area found at 0x140000000 (size = 0x40000000) EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 error allocating rte services array EAL: FATAL: rte_service_init() failed EAL: rte_service_init() failed It is indeed the IOVA backports that Neil pointed at that are missing in the dpdk 18.11 downstream package. Prepared a scratch build for the 18.11 downstream package for rhel8: http://brew-task-repos.usersys.redhat.com/repos/scratch/dmarchan/dpdk/18.11.2/4.bz1785933.el8/x86_64/ Just to clarify on the dpdk packages: - dpdk 17.11 should be fine, - dpdk 18.11 will have the reported issue, which should be worked around by setting the --iova-mode=va option, and requires the backports, - dpdk 19.11 should be fine, For OVS packages: - ovs 2.9 (dpdk 17.11) should be fine, - ovs 2.11 and ovs 2.12 (dpdk 18.11) should be fine, since the fixes have been backported - ovs master which will become 2.13 (dpdk 19.11) should be fine, (In reply to David Marchand from comment #8) > Prepared a scratch build for the 18.11 downstream package for rhel8: > http://brew-task-repos.usersys.redhat.com/repos/scratch/dmarchan/dpdk/18.11.2/4.bz1785933.el8/x86_64/ > Sebastian, can you have a try with this scratch build? Thanks. (In reply to David Marchand from comment #11) > (In reply to David Marchand from comment #8) > > Prepared a scratch build for the 18.11 downstream package for rhel8: > > http://brew-task-repos.usersys.redhat.com/repos/scratch/dmarchan/dpdk/18.11.2/4.bz1785933.el8/x86_64/ > > > > Sebastian, can you have a try with this scratch build? > Thanks. I ran test with this scratch build, it worked without specifying iova mode. From the log message, it detects the iova mode automatically. [root@testpod1 /]# testpmd -l 18,19 -w 0000:af:00.6 -- -i EAL: Detected 72 lcore(s) EAL: Detected 2 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: Probing VFIO support... EAL: PCI device 0000:af:00.6 on NUMA socket 1 EAL: probe driver: 15b3:1016 net_mlx5 net_mlx5: flow rules relying on switch offloads will not be supported: netlink: failed to remove ingress qdisc: Operation not permitted Interactive-mode selected testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=155456, size=2176, socket=1 testpmd: preferred mempool ops selected: ring_mp_mc Warning! port-topology=paired and odd forward ports number, the last port will pair with itself. Configuring Port 0 (socket 1) Port 0: CA:FE:C0:FF:EE:01 Checking link statuses... Done testpmd> show port info all ********************* Infos for port 0 ********************* MAC address: CA:FE:C0:FF:EE:01 Device name: 0000:af:00.6 Driver name: net_mlx5 Devargs: Connect to socket: 1 memory allocation on the socket: 1 Link status: up Link speed: 25000 Mbps Link duplex: full-duplex MTU: 1500 Promiscuous mode: enabled Allmulticast mode: disabled Maximum number of MAC addresses: 128 Maximum number of MAC addresses of hash filtering: 0 VLAN offload: strip off filter off qinq(extend) off Hash key size in bytes: 40 Redirection table size: 1 Supported RSS offload flow types: ipv4 ipv4-frag ipv4-tcp ipv4-udp ipv4-other ipv6 ipv6-frag ipv6-tcp ipv6-udp ipv6-other user defined 15 user defined 16 user defined 17 Minimum size of RX buffer: 32 Maximum configurable length of RX packet: 65536 Current number of RX queues: 1 Max possible RX queues: 65535 Max possible number of RXDs per queue: 65535 Min possible number of RXDs per queue: 0 RXDs number alignment: 1 Current number of TX queues: 1 Max possible TX queues: 65535 Max possible number of TXDs per queue: 65535 Min possible number of TXDs per queue: 0 TXDs number alignment: 1 Switch name: 0000:af:00.6 Switch domain Id: 1 Switch Port Id: 65535 testpmd> quit Stopping port 0... Stopping ports... Done Shutting down port 0... Closing ports... Done Bye... [root@testpod1 /]# rpm -qa | grep -i dpdk dpdk-tools-18.11.2-4.bz1785933.el8.x86_64 dpdk-18.11.2-4.bz1785933.el8.x86_64 dpdk-devel-18.11.2-4.bz1785933.el8.x86_64 Thanks for the test. Just want to be sure. You did this test as root user inside a unpriviledged container. Is it correct? (In reply to David Marchand from comment #15) > Thanks for the test. > > Just want to be sure. > You did this test as root user inside a unpriviledged container. Is it > correct? Yes, root user inside container with IPC_LOCK capability assigned. User: # oc exec -it testpod1 bash [root@testpod1 /]# whoami root Container spec: spec: containers: - name: appcntr1 image: quay.io/zshi/ubi8-dpdk imagePullPolicy: IfNotPresent securityContext: capabilities: add: ["IPC_LOCK"] command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 300000; done;" ] resources: requests: hugepages-1Gi: 4Gi cpu: '4' memory: 1000Mi limits: hugepages-1Gi: 4Gi cpu: '4' memory: 1000Mi (In reply to zenghui.shi from comment #4) > Dpdk pkg version: > > # rpm -qa | grep dpdk > dpdk-tools-19.11-1.el8.x86_64 > dpdk-19.11-1.el8.x86_64 > > > Testpmd log with iova-mode=pa: > > # testpmd -l 54,55,56 -w 0000:af:01.0 --iova-mode=pa -- -i > EAL: Detected 72 lcore(s) > EAL: Detected 2 NUMA nodes > net_mlx5: cannot load glue library: /lib64/libmlx5.so.1: version `MLX5_1.10' > not found (required by > /usr/lib64/dpdk-pmds-glue/librte_pmd_mlx5_glue.so.19.08.0) > net_mlx5: cannot initialize PMD due to missing run-time dependency on > rdma-core libraries (libibverbs, libmlx5) Zengshui, not relevant with the IOVA issue, but coming back on this warning. This warning is because you installed the 19.11 dpdk package on a 8.1 system. The 19.11 package has been generated on rhel 8.2 and needs newer rdma-core/libibverbs packages than the ones in 8.1. (In reply to David Marchand from comment #11) > (In reply to David Marchand from comment #8) > > Prepared a scratch build for the 18.11 downstream package for rhel8: > > http://brew-task-repos.usersys.redhat.com/repos/scratch/dmarchan/dpdk/18.11.2/4.bz1785933.el8/x86_64/ > > > > Sebastian, can you have a try with this scratch build? > Thanks. Hi David, I try this and it works. Thanks! *** Bug 1783763 has been marked as a duplicate of this bug. *** |
Description of problem: I am unable to run a dpdk workload without privileged=true Version-Release number of selected component (if applicable): openshift 4.3 How reproducible: 100% Steps to Reproduce: 1. deploy sriov operator 2. configure the sriov interface and policy 3. patch the nodes kernel parameter to enable "intel_iommu=on and iommu=pt" 4. configure hugepages in the system 5. deploy a testpmd pod using the following yaml. 6. exec into the pod and start the testpmd application testpmd -l <cpu-on-the-same-numa> -w <vf-pci-address> -- -i Actual results: testpmd -l 22,24,26 -w 0000:19:01.2 -- -i EAL: Detected 80 lcore(s) EAL: Detected 2 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Probing VFIO support... EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function. error allocating rte services array EAL: FATAL: rte_service_init() failed EAL: rte_service_init() failed PANIC in main(): Cannot init EAL 5: [testpmd() [0x42e657]] 4: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f044e1363d5]] 3: [testpmd(main+0xbd7) [0x42e5b7]] 2: [/lib64/librte_eal.so.9(__rte_panic+0xbd) [0x7f044f3396bd]] 1: [/lib64/librte_eal.so.9(rte_dump_stack+0x2d) [0x7f044f34503d]] Aborted (core dumped) Expected results: testpmd -l 22,24,26 -w 0000:19:01.0 -- -i EAL: Detected 80 lcore(s) EAL: Detected 2 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Probing VFIO support... EAL: PCI device 0000:19:01.0 on NUMA socket 0 EAL: probe driver: 15b3:1016 net_mlx5 Interactive-mode selected testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=163456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Warning! port-topology=paired and odd forward ports number, the last port will pair with itself. Configuring Port 0 (socket 0) Port 0: 42:E4:4B:F5:1E:9B Checking link statuses... Done Additional info: testpmd pod yaml with privileged=true: apiVersion: v1 kind: Pod metadata: name: testpod annotations: k8s.v1.cni.cncf.io/networks: sriov-network spec: containers: - name: appcntr1 image: quay.io/mmirecki/ds-testpmd imagePullPolicy: IfNotPresent securityContext: capabilities: add: ["IPC_LOCK"] privileged: true command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 300000; done;" ] resources: requests: hugepages-1Gi: 4Gi cpu: '6' memory: 1000Mi limits: hugepages-1Gi: 4Gi cpu: '6' memory: 1000Mi volumeMounts: - mountPath: /mnt/huge name: hugepage readOnly: False volumes: - name: hugepage emptyDir: medium: HugePages testpmd pod yaml without privileged=true: apiVersion: v1 kind: Pod metadata: name: testpod annotations: k8s.v1.cni.cncf.io/networks: sriov-network spec: containers: - name: appcntr1 image: quay.io/mmirecki/ds-testpmd imagePullPolicy: IfNotPresent securityContext: capabilities: add: ["IPC_LOCK"] command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 300000; done;" ] resources: requests: hugepages-1Gi: 4Gi cpu: '6' memory: 1000Mi limits: hugepages-1Gi: 4Gi cpu: '6' memory: 1000Mi volumeMounts: - mountPath: /mnt/huge name: hugepage readOnly: False volumes: - name: hugepage emptyDir: medium: HugePages