Description of problem: On an OpenStack-based VM running RHEL7.2 and provided with an SR-IOV-based NIC, we wanted to 1) install DPDK, 2) bind the SR-IOV NIC to a DPDK driver and 3) send traffic using testpmd. The problem is that using 'dpdk_nic_bind.py' to bind the SR-IOV NIC to either uio_pci_generic or vfio-pci fails. However, building igb_uio from source and using this will work (this issue is also mentioned on e.g. http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html). More details and command-lines in email excerpt below. Our expectation was that installing 'dpdk' and 'dpdk-tools' would be sufficient for running DPDK-based applications and that building igb_uio ourself would not be necessary. This might be a misinterpretation from our side, however, we have also not seen RH documentation on this (which is why this TR has been categorized as 'documentation'). Version-Release number of selected component (if applicable): DPDK version 2.2.0 release 2.el7 Red Hat Enterprise Linux Server release 7.2 (Maipo) Additional info (excerpt from email thread): From: Marcos Garcia <mgarciam> Hi Michael As you can guess, we didn't test a DPDK app on this environment (VMs using RHEL 7.2 with DPDK 2.0 or 2.2), with SRIOV VF (I've seen tests using pci passthrough). Thank you for your patience, given that I cannot work on this at the moment as I'm at a conference (maybe this weekend I can connect for a couple of hours), The product manager suggested igb_uio instead of uio_generic. Can you quickly test that? However, you've tried VFIO, which he also believed that could help here (given that we've enabled IOMMU on the host).... by the way, your Heat stack uses the nova flavor that we've enabled for NFV, right? Also, the engineering team asked me to open a bugzilla to keep track of this issue. It its either a documentation gap, or maybe a bug with our version of DPDK with this model of hardware. They are very interested in this. Just go to https://bugzilla.redhat.com/ and create an account, it's free. So please open a bugzilla, add me as an observer, and the team will comment directly there. I'll put their comments there, but they will require us to upload files like a sosreport. So far, they've said the following: http://www.litchfie.dpdk.org/ml/archives/dev/2015-September/024064.html (....) This means the PCI device has no LSI assigned to it. It will only be able to use MSI or MSI-X. uio_pci_generic has this limitation that it will only work with LSI. If your system has IOMMU, you can try using vfio-pci. (....) My quick technical take on this: try with igb_uio instead of uio_generic, this should fix the issue. However, this is not a long term solution, we need VFIO as a proper long term solution. (....) I don't know if there's still a question here, the link above seems to provide the answer. SR-IOV VFs, by definition, do not support legacy INTx signaling. UIO requires INTx, probably since that's almost the only thing it provides is legacy interrupt signaling to the user. I don't know that the pci-stub suggestion is really viable since then we don't have a call to pci_enable_device() and may not have resources allocated to the device. Apparently dpdk-uio has only been used with PF devices, which may be a workaround for this case. vfio-noiommu solves this, but won't be available until 7.3 or a guest with a v4.5 based kernel (with vfio-noiommu enabled) and dpdk 16.04. (....) vfio-noiommu will solve it, but note that this as an unsupported solution since there is no iommu protection and a user process gets access to all of the guest memory. We are working on a full solution for it, targeting 7.3 (not sure we will make it on time, may slip to 7.4). On Wed, Apr 20, 2016 at 10:38 PM, Michael Vittrup Larsen <michael.larsen> wrote: Hi, I tried binding to both 'uio_pci_generic' as shown in the debug log and also vfio-pci, which resulted in a similar error and an 'EINVAL' error in dmesg. Note that we are deliberately trying to not build DPDK from source but instead use only the 'standard' packages dpdk and dpdk-tools. /Michael On 20 April 2016 at 19:35, Marcos Garcia <mgarciam> wrote: Hi Michael I've asked the experts, as I have no time today to test it, to see if you have missed anything. The only thing I see is that you used UIO binding instead of VFIO, which I think is what you should have used instead (as we enabled IOMMU on the host) Instructions for VFIO are similar to OVS-DPDK, see section 1.4.2.1 here https://access.stage.redhat.com/documentation/en/red-hat-openstack-platform/version-8/configure-dpdk-for-openstack-networking/#neutron_configuration_configure_the_bridge_for_dpdk_traffic Let me know if that works. I'll forward you the comments from our experts On Wed, Apr 20, 2016 at 6:11 AM, Michael Vittrup Larsen <michael.larsen> wrote: Hi, I was wondering if you have tested DPDK with the current OpenStack setup, because we seem to be unable to make it work on this installation. Maybe we are doing something wrong... Below is a summary of what we did, and it basically ended up with a failed attempt to bind the SR-IOV NIC to the DPDK driver. /Michael ------------------------ Spin-up EPC VMs: [tieto@centos-70 ~]$ source tietorc [tieto@centos-70 ~]$ cd heat-templates/ [tieto@centos-70 ~]$ heat stack-create -f epc.yaml -e env.yaml epcstack [tieto@centos-70 ~]$ cd .. [tieto@centos-70 ~]$ ssh -i tietokey cloud-user.27.238 [cloud-user@loadgen1 ~]$ sudo subscription-manager register --username xxx --password yyy [cloud-user@loadgen1 ~]$ sudo subscription-manager attach [cloud-user@loadgen1 ~]$ sudo subscription-manager repos --enable rhel-7-server-extras-rpms Enable hugepages: [root@loadgen1 cloud-user]# cat /etc/default/grub |grep huge GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto no_timer_check net.ifnames=0 console=ttyS0,115200n8 default_hugepagesz=1G hugepagesz=1G hugepages=1" [root@loadgen1 cloud-user]# grub2-mkconfig -o /boot/grub2/grub.cfg Reboot After reboot the order of the interfaces has changed - SR-IOV NIC is eth1: eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1400 inet 172.20.1.77 netmask 255.255.0.0 broadcast 172.20.255.255 inet6 fe80::f816:3eff:fe1a:bff1 prefixlen 64 scopeid 0x20<link> ether fa:16:3e:1a:bf:f1 txqueuelen 1000 (Ethernet) RX packets 176 bytes 32884 (32.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 276 bytes 27604 (26.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether fa:16:3e:b1:9c:85 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 48 bytes 7968 (7.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 Install DPDK: [cloud-user@loadgen1 ~]$ sudo yum install dpdk dpdk-tools driverctl Investigate drivers for interfaces: [cloud-user@loadgen1 ~]$ /usr/share/dpdk/tools/dpdk_nic_bind.py --status Network devices using DPDK-compatible driver ============================================ <none> Network devices using kernel driver =================================== 0000:00:03.0 'Virtio network device' if= drv=virtio-pci unused= 0000:00:05.0 '82599 Ethernet Controller Virtual Function' if=eth1 drv=ixgbevf unused= Other network devices ===================== <none> Use DPDK testpmd: [root@loadgen1 cloud-user]# /usr/bin/testpmd -c 1 -n 1 -b 0000:00:03.0 -- -i ... EAL: PCI device 0000:00:03.0 on NUMA socket -1 EAL: probe driver: 1af4:1000 rte_virtio_pmd EAL: Device is blacklisted, not initializing EAL: PCI device 0000:00:05.0 on NUMA socket -1 EAL: probe driver: 8086:10ed rte_ixgbevf_pmd EAL: Not managed by a supported kernel driver, skipped EAL: No probed ethernet devices Try binding SR-IOV NIC to DPDK: [root@loadgen1 cloud-user]# modprobe uio_pci_generic [root@loadgen1 cloud-user]# /usr/share/dpdk/tools/dpdk_nic_bind.py --status Network devices using DPDK-compatible driver ============================================ <none> Network devices using kernel driver =================================== 0000:00:03.0 'Virtio network device' if= drv=virtio-pci unused=uio_pci_generic 0000:00:05.0 '82599 Ethernet Controller Virtual Function' if=eth1 drv=ixgbevf unused=uio_pci_generic Other network devices ===================== <none> the above confirms, that the uio_pci_generic driver is supported for PCI device 0000:00:05.0. Doing the binding fails: [root@loadgen1 cloud-user]# /usr/share/dpdk/tools/dpdk_nic_bind.py --bind=uio_pci_generic 0000:00:05.0 Error: bind failed for 0000:00:05.0 - Cannot bind to driver uio_pci_generic Error: unbind failed for 0000:00:05.0 - Cannot open /sys/bus/pci/drivers//unbind Error in dmesg: [ 805.665372] uio_pci_generic 0000:00:05.0: No IRQ assigned to device: no support for interrupts? [root@loadgen1 cloud-user]# lspci -s 0000:00:05.0 -vvv 00:05.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) Subsystem: Intel Corporation Device 7b11 Physical Slot: 5 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Region 0: Memory at fe000000 (64-bit, prefetchable) [size=16K] Region 3: Memory at fe004000 (64-bit, prefetchable) [size=16K] Capabilities: [70] MSI-X: Enable- Count=3 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [a0] Express (v0) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed unknown, Width x0, ASPM not supported, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- which confirms that the device has no IRQ.
Upstream doc: http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html#loading-modules-to-enable-userspace-io-for-dpdk We need to map this to our RHEL VMs and RHEL hypervisor (as deployed by OpenStack as overcloud kernel)
We have observed another thing for which you might consider a bug report. The issue is that DPDK takes over the management NIC (non SR-IOV) NIC even when it has not been bound to DPDK. This can easily be confirmed by booting a VM with SR-IOV and running the DPDK testpmd application - it will take over the management NIC and one looses network connectivity to the VM. Explicitly blacklisting the NIC when running testpmd or removing the virtio driver from ones DPDK application is a work-around for this. We have chosen the latter for our application.
This is a feature from DPDK perspective, not a bug, see: http://dpdk.org/ml/archives/dev/2015-December/029102.html The virtio driver doesn't need any uio/vfio driver, it takes all detected virtio interfaces. Nice from DPDK perspective, requires to blacklist virtio interfaces on the user side or to disable virtio pmd from a user perspective. At least we could update dpdk.org documentation to enlighten this "feature"...
Just FYI, in DPDK >= 16.04 this particular quirk is gone: it only takes over devices which not bound to any driver at all.
See the Following Doc for DPDK - SRIOV, it is done with RHOS-8 https://docs.google.com/document/d/1KQkISoqGtjuFlLCxAEnXu6V9tq4etVPzFIKUFVVNbYU/edit Could you verify if it helped? We are working on document for RHOS10 also, it is not ready yet
This will be solved once VFIO will be supported within the huest *** This bug has been marked as a duplicate of bug 1335808 ***
Please note that vfio-noiommu is no worse for security than any of the uio mechanism and does add MSI support, which allows it to work better with SR-IOV VFs than uio_generic. vfio-noiommu does taint the kernel where it's run but the fact that uio does not do this is really more an issue that uio was never meant to support devices making use of DMA, which is exactly how DPDK uses it. With a VF assigned to a guest with vfio-pci in the host and to a user within the guest using vfio-noiommu, the exposure is to the guest, the host is fully isolated. This is the same degree of risk as if the guest was a bare metal system without an IOMMU.