Bug 1299662

Summary: VFIO: include no-IOMMU mode - not supported
Product: Red Hat Enterprise Linux 7 Reporter: Karen Noel <knoel>
Component: kernelAssignee: Alex Williamson <alex.williamson>
kernel sub component: Other QA Contact: Pei Zhang <pezhang>
Status: CLOSED ERRATA Docs Contact: Jiri Herrmann <jherrman>
Severity: unspecified    
Priority: high CC: alex.williamson, dhoward, dyuan, ferruh.yigit, huding, jen, jherrman, jishao, juzhang, lhuang, pasik, peterx, pezhang, qding, rsibley, wchadwic, xfu, xiywang
Version: 7.3Keywords: ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-377.el7 Doc Type: Technology Preview
Doc Text:
.No-IOMMU mode for VFIO drivers As a Technology Preview, this update adds No-IOMMU mode for virtual function I/O (VFIO) drivers. The No-IOMMU mode provides the user with full user-space I/O (UIO) access to a direct memory access (DMA)-capable device without a I/O memory management unit (IOMMU). Note that in addition to not being supported, using this mode is not secure due to the lack of I/O management provided by IOMMU.
Story Points: ---
Clone Of:
: 1301139 1377958 (view as bug list) Environment:
Last Closed: 2016-11-03 15:06:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1322577    
Bug Blocks: 1202600, 1301139, 1301141, 1301142, 1358853, 1377958, 1383822    

Description Karen Noel 2016-01-18 23:25:54 UTC
VFIO: include no-IOMMU mode - not supported

Enable use of VFIO for DPDK instead of UIO. 

From https://lkml.org/lkml/2015/12/22/541

There is really no way to safely give a user full access to a DMA
capable device without an IOMMU to protect the host system.  There is
also no way to provide DMA translation, for use cases such as device
assignment to virtual machines.  However, there are still those users
that want userspace drivers even under those conditions.  The UIO
driver exists for this use case, but does not provide the degree of
device access and programming that VFIO has.  In an effort to avoid
code duplication, this introduces a No-IOMMU mode for VFIO.

This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
the "enable_unsafe_noiommu_mode" option on the vfio driver.  This
should make it very clear that this mode is not safe.  Additionally,
CAP_SYS_RAWIO privileges are necessary to work with groups and
containers using this mode.  Groups making use of this support are
named /dev/vfio/noiommu-$GROUP and can only make use of the special
VFIO_NOIOMMU_IOMMU for the container.  Use of this mode, specifically
binding a device without a native IOMMU group to a VFIO bus driver
will taint the kernel and should therefore not be considered
supported.  This patch includes no-iommu support for the vfio-pci bus
driver only.

Comment 2 Rafael Aquini 2016-04-13 14:58:12 UTC
Patch(es) available on kernel-3.10.0-377.el7

Comment 5 FuXiangChun 2016-05-23 10:13:32 UTC
Alex,
I tested this bug inside guest with fixed kernel 3.10.0-408.el7.x86_64. Detailed steps as below.

1.load related vfio model
#modprobe -r vfio 
#modprobe -r vfio_iommu_type1
#modprobe vfio enable_unsafe_noiommu_mode=Yenable_unsafe_noiommu_mode=Y
#modprobe vfio-pci
# cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode 
Y

2.bind NIC to vfio
# lspci|grep Eth
00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
00:06.0 Ethernet controller: Red Hat, Inc Virtio network device

# dpdk_nic_bind --bind vfio-pci 00:02.0
# dpdk_nic_bind --bind vfio-pci 00:06.0

# dpdk_nic_bind --status

Network devices using DPDK-compatible driver
============================================
0000:00:02.0 'Virtio network device' drv=vfio-pci unused=
0000:00:06.0 'Virtio network device' drv=vfio-pci unused=

I have several questions about this bug. 

Q1. What is purpose of this bug? In order for dpdk to use vfio instead of UIO?

Q2. I found there are 2 related bugs as below. Will this bug be as workaround before 2 bugs are fixed?
Bug 1335808 - [RFE] [vIOMMU] Add Support for VFIO devices with vIOMMU present
Bug 1283262 - [RFE] IOMMU support in Vhost-user 

Q3. Beside this 2 points as above. Do you have any other purpose?

Comment 6 Alex Williamson 2016-05-23 15:46:05 UTC
(In reply to FuXiangChun from comment #5)
> Q1. What is purpose of this bug? In order for dpdk to use vfio instead of
> UIO?

Yes.  uio does not support MSI/x interrupts for the device, requiring the device to either operate in polled mode or support INTx.  SR-IOV VFs do not support INTx per the spec, therefore vfio-no-iommu enables an interrupt model for such devices when an IOMMU is not available.

> Q2. I found there are 2 related bugs as below. Will this bug be as
> workaround before 2 bugs are fixed?
> Bug 1335808 - [RFE] [vIOMMU] Add Support for VFIO devices with vIOMMU present
> Bug 1283262 - [RFE] IOMMU support in Vhost-user 

Yes, no-iommu taints the kernel where it's used because it offers no DMA isolation for the userspace devices.  I believe both of the above solutions do offer isolation, but they may not be available until after RHEL7.3.

> Q3. Beside this 2 points as above. Do you have any other purpose?

No, no-iommu support is really tailored for non-QEMU userspace drivers, specifically DPDK, when an IOMMU is not present.  It helps to unify DPDK towards a vfio userspace device model and a clear indication when unsafe DMA has been used, through kernel tainting, for support implications.

Comment 7 FuXiangChun 2016-05-26 06:00:45 UTC
Verified this bug with 3.10.0-410.rt56.293.el7.x86_64.

I use 2 scenarios to cover this bug.

S1: 2 hosts use vfio with iommu model. guest with vfio-noiommu model.

1.setup ovs+dpdk+'vhost-user'+'vfio with iommu' on hostA
# ovs-vsctl show
a5e1deb4-5cb9-4e7f-809f-e54ca109185e
    Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuser
    Bridge "ovsbr1"
        Port "ovsbr1"
            Interface "ovsbr1"
                type: internal
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
        Port "vhost-user2"
            Interface "vhost-user2"
                type: dpdkvhostuser

2. Boot 7.3 guest with xml on host-A

3. Inside guest. load vfio model without iommu

 3.1).load related vfio model without iommu
 #modprobe -r vfio 
 #modprobe -r vfio_iommu_type1
 #modprobe vfio enable_unsafe_noiommu_mode=Yenable_unsafe_noiommu_mode=Y
 #modprobe vfio-pci
 # cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode 
 Y

 3.2).bind NIC to vfio
 # dpdk_nic_bind --bind vfio-pci 00:02.0
 # dpdk_nic_bind --bind vfio-pci 00:06.0

 # dpdk_nic_bind --status

Network devices using DPDK-compatible driver
============================================
0000:00:02.0 'Virtio network device' drv=vfio-pci unused=
0000:00:06.0 'Virtio network device' drv=vfio-pci unused=

4. run testpmd inside guest
#chrt -f 95 testpmd -d /usr/lib64/librte_pmd_virtio.so -l 1,2,3 --socket-mem 1024 -n 1 --proc-type auto --file-prefix pg -w 00:02.0 -w 00:06.0 -- --portmask=3 --disable-hw-vlan --disable-rss -i --rxq=1 --txq=1 --rxd=256 --txd=256 --auto-start --nb-cores=2

5. load vfio model with vfio on host-B

6. bind 2 10000M NIC to vfio on host-B
# dpdk_nic_bind --status

Network devices using DPDK-compatible driver
============================================
0000:01:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=vfio-pci unused=
0000:01:00.1 'Ethernet Controller 10-Gigabit X540-AT2' drv=vfio-pci unused=

7. run Moongen on host-B
## chrt -f 95 ./MoonGen /home/l2-load-latency.lua 0 1 1

Moongen can send and receive data. e.g
[Device: id=1] Received 1999 packets, current rate 0.00 Mpps, 0.00 MBit/s, 0.00 MBit/s wire rate.
[Device: id=1] Sent 214013984 packets, current rate 1.00 Mpps, 512.02 MBit/s, 672.02 MBit/s wire rate.
......

S2. with vfio-noiommu model(include hostA+hostB+guest)
Detailed steps are the same as above.

According to this test result. with vfio-noiommu model works inside guest. 

Summary. Currently,QE will use this 2 scenarios as above to cover vfio-noiommu model in RHEL7.3.  As this bug only is as workaround. so QE will re-test vIOMMU model inside guest once the following 2 bugs are fixed. 
Bug 1335808 - [RFE] [vIOMMU] Add Support for VFIO devices with vIOMMU present
Bug 1283262 - [RFE] IOMMU support in Vhost-user

Comment 8 Pei Zhang 2016-07-25 02:27:53 UTC
Hi Alex,

no-IOMMU doesn't work with device assignment. As in below testing, network devices in guest can be bind to vfio-pci, but actually doesn't work with dpdk testpmd. Is this expected or should I file a new bug? 

Steps:
1. Load no-IOMMU vfio module
# modprobe -r vfio
# modprobe -r vfio_iommu_type1
# modprobe vfio enable_unsafe_noiommu_mode=Y
# modprobe vfio-pci
# cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode 
Y

2. Bind 2 network devices to vfio-pci. 
# lspci | grep Eth
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
00:08.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
00:09.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)

# dpdk_nic_bind --bind=vfio-pci 00:08.0
# dpdk_nic_bind --bind=vfio-pci 00:09.0

# dpdk_nic_bind --status

Network devices using DPDK-compatible driver
============================================
0000:00:08.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=vfio-pci unused=ixgbe
0000:00:09.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=vfio-pci unused=ixgbe

Network devices using kernel driver
===================================
0000:00:03.0 'Virtio network device' if= drv=virtio-pci unused=virtio_pci,vfio-pci 

Other network devices
=====================
<none>

3. Start dpdk testpmd in guest, testpmd prompts "EAL:   0000:00:08.0 not managed by VFIO driver, skipping". 

# cat testpmd-passthrough.sh 
queues=1
cores=2
testpmd -l 0,1,2 -n 1 -d /usr/lib64/librte_pmd_ixgbe.so.1 \
-w 00:08.0 -w 00:09.0 \
-- \
--nb-cores=${cores} \
--disable-hw-vlan -i \
--disable-rss \
--rxq=${queues} --txq=${queues} \
--auto-start \
--rxd=256 --txd=256 \

# sh testpmd-passthrough.sh
...
EAL: PCI device 0000:00:08.0 on NUMA socket -1
EAL:   probe driver: 8086:1528 rte_ixgbe_pmd
EAL:   0000:00:08.0 not managed by VFIO driver, skipping
EAL: PCI device 0000:00:09.0 on NUMA socket -1
EAL:   probe driver: 8086:1528 rte_ixgbe_pmd
EAL:   0000:00:09.0 not managed by VFIO driver, skipping
EAL: No probed ethernet devices
...


Thank you,
Pei

Comment 9 Alex Williamson 2016-07-25 02:41:42 UTC
Your results are in direct contradiction to comment 7, perhaps consult with FuXiangChun.  What version of DPDK are you using?  Is it new enough to be aware of vfio no-iommu?

Comment 10 Pei Zhang 2016-07-25 03:01:39 UTC
(In reply to Alex Williamson from comment #9)
> Your results are in direct contradiction to comment 7, perhaps consult with
> FuXiangChun. 
In Comment 7, the test scenario is guest with vhostuser and ovs-dpdk, not device assignment.
And scenario guest with vhostuser and ovs-dpdk also works in my environment.

> What version of DPDK are you using?  Is it new enough to be
> aware of vfio no-iommu?
dpdk-2.2.0-3.el7.x86_64. Same version with Comment 7.

Thank you,
Pei

Comment 11 Alex Williamson 2016-07-25 03:28:34 UTC
I don't know enough about dpdk to answer your question, device assignment is absolutely in no way supported or intended to work using the vfio no-iommu mode, but your example is still using some form of dpdk, which afaik does not do device assignment, ie. expose a device to a virtual machine, it's simply a userspace driver.

Comment 14 juzhang 2016-07-26 03:13:58 UTC
Remove xfu's needinfo according to comment10.

Comment 24 Alex Williamson 2016-07-29 13:01:14 UTC
*** Bug 1360834 has been marked as a duplicate of this bug. ***

Comment 28 errata-xmlrpc 2016-11-03 15:06:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2574.html