Hide Forgot
Description of problem: With vfio and iommu set up and an 82576 gigabit nic bound to dpdk, ovs-vswitchd fails to start. There is an undefined symbol error and an IOMMU permissions error. The output: [root@puma10 ~]# dpdk_nic_bind.py --status Network devices using DPDK-compatible driver ============================================ 0000:05:00.1 '82576 Gigabit Network Connection' drv=vfio-pci unused= Network devices using kernel driver =================================== 0000:04:00.0 'OneConnect 10Gb NIC' if=enp4s0f0 drv=be2net unused=vfio-pci *Active* 0000:04:00.1 'OneConnect 10Gb NIC' if=enp4s0f1 drv=be2net unused=vfio-pci 0000:05:00.0 '82576 Gigabit Network Connection' if=enp5s0f0 drv=igb unused=vfio-pci Other network devices ===================== <none> [root@puma10 ~]# ovs-vswitchd --dpdk -d /usr/lib64/librte_pmd_e1000.so -l 1,2 -n 1 --socket-mem 1024,0 -- unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall 2016-01-19T17:23:30Z|00001|dpdk|INFO|No -vhost_sock_dir provided - defaulting to /var/run/openvswitch EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 1 on socket 0 EAL: Detected lcore 2 as core 2 on socket 0 EAL: Detected lcore 3 as core 8 on socket 0 EAL: Detected lcore 4 as core 9 on socket 0 EAL: Detected lcore 5 as core 10 on socket 0 EAL: Detected lcore 6 as core 0 on socket 0 EAL: Detected lcore 7 as core 1 on socket 0 EAL: Detected lcore 8 as core 2 on socket 0 EAL: Detected lcore 9 as core 8 on socket 0 EAL: Detected lcore 10 as core 9 on socket 0 EAL: Detected lcore 11 as core 10 on socket 0 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 12 lcore(s) EAL: Setting up memory... EAL: Ask a virtual area of 0x100000000 bytes EAL: Virtual area found at 0x7fcac0000000 (size = 0x100000000) EAL: Requesting 1 pages of size 1024MB from socket 0 EAL: TSC frequency is ~2400086 KHz EAL: open shared lib /usr/lib64/librte_pmd_e1000.so EAL: /usr/lib64/librte_pmd_e1000.so: undefined symbol: per_lcore__lcore_id EAL: Master lcore 1 is ready (tid=cae52c00;cpuset=[1]) PMD: ENICPMD trace: rte_enic_pmd_init EAL: lcore 2 is ready (tid=c6e99700;cpuset=[2]) EAL: PCI device 0000:05:00.0 on NUMA socket -1 EAL: probe driver: 8086:10c9 rte_igb_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device 0000:05:00.1 on NUMA socket -1 EAL: probe driver: 8086:10c9 rte_igb_pmd EAL: cannot set IOMMU type, error 1 (Operation not permitted) EAL: 0000:05:00.1 DMA remapping failed, error 1 (Operation not permitted) EAL: Error - exiting with code: 1 Cause: Requested device 0000:05:00.1 cannot be used
Terry, Have you tried with UIO? Thanks, fbl
Drop "-d /usr/lib64/librte_pmd_e1000.so" from the command line, it is not needed with OVS anyway and in this case its actually harmful: openvswitch-dpdk is linked statically to DPDK 2.0 and you're trying to load a driver from a differently configured DPDK to another one, which is the reason for at least the undefined symbol, and wouldn't be surprised if the rest starts too when that problem is out of the way.
fbl: UIO worked. panu: It did work with UIO with the -d. So you are saying to never pass the -d option to OVS? If so, all of our documentation is wrong and will need to be changed. Terry
Yes, you never need -d with OVS unless you have a 3rd party PMD. Also openvswitch-dpdk does not need (and cannot use anything from) the dpdk package you apparently have installed since /usr/lib64/librte_pmd_e1000.so exists. And yeah, looking closer at the startup log, it starts despite the -d because e1000 is the wrong driver for your NIC anyway, it uses the igb driver. The IOMMU error is a separate issue related to IOMMU (which is required for VFIO but not UIO), possibly hardware limitation. What kind of system is this?
> What kind of system is this? The machine with the issue belongs to ekuris, so I'll add him as needinfo.
based on rhel OS
Eran, the question was about the system hardware, OS is rather obvious in this context.
lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 45 Model name: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz Stepping: 7 CPU MHz: 1199.953 BogoMIPS: 4609.25 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 15360K NUMA node0 CPU(s): 0-5,12-17 NUMA node1 CPU(s): 6-11,18-23 05:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)
Does the error go away if you bind both NICs to vfio-pci? If not, please add the output of 'find /sys/kernel/iommu_groups/ -type l' from the system.
Redirecting the question to ekuris since these are his machines.
I have to check it again I can answer that after I deploy the setup
Been over half a year now, feel free to reopen if/when you can reproduce the issue. That said, I suspect the system log would contain something along the lines of: "vfio-pci 0000:05:00.1: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor." ...at the time of attempted use, which would be a limitation of the server and its configuration and not a bug. But obviously without hard data this is just a guess.