Bug 1259556

Summary: Allow VFIO devices on the same guest PHB as emulated devices
Product: Red Hat Enterprise Linux 7 Reporter: David Gibson <dgibson>
Component: qemu-kvm-rhevAssignee: David Gibson <dgibson>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 7.2CC: abologna, gklein, hannsj_uhl, juzhang, lmiksik, michen, mrezanin, qzhang, virt-maint, zhengtli
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.3.0-29.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-04 16:55:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 825045, 1154205, 1172230, 1201513, 1261708, 1264728, 1277183, 1277184    

Description David Gibson 2015-09-03 03:29:30 UTC
Description of problem:

The current VFIO code for KVM on Power requires that a VFIO device passed to a guest must appear on a special guest PCI Host Bridge (PHB) dedicated to that device's IOMMU group.  Emulated PCI devices or VFIO devices from different IOMMU groups can't appear on the same PHB in the guest.

This causes problems for libvirt and RHEV which don't expect to need to specially instantiate PCI host bridges in order to add VFIO devices.

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-2.3.0-21.el7.ppc64le

Additional info:

The dynamic DMA window patches allowed this as a side effect.  Alexey Kardashevskiy has now posted separate patches allowing this without the rest of the DDW stuff.

This is very late for RHEL 7.2, but it's probably still simpler than attempting to handle the problem in libvirt and/or oVirt.

Comment 2 David Gibson 2015-09-08 01:04:10 UTC
Draft build at http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9803236

Comment 3 Zhengtong 2015-09-09 08:18:42 UTC
Hi, Here is an interesting result in my test.

Host Kernel: 3.10.0-315.el7.ppc64le
qemu verison: qemu-kvm-rhev-2.3.0-19.el7


1. Bind  an usb controller(IOMMU group 2) and a network device(IOMMU group 1) to vfio-pci:

#echo "104c 8241" > /sys/bus/pci/drivers/vfio-pci/new_id
#echo 0003:03:00.0 > /sys/bus/pci/devices/0003\:03\:00.0/driver/unbind 
#echo 0003:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind

#echo "14e4 1657"  > /sys/bus/pci/drivers/vfio-pci/new_id
#echo 0003:09:00.0 > /sys/bus/pci/devices/0003\:09\:00.0/driver/unbind
#echo 0003:09:00.1 > /sys/bus/pci/devices/0003\:09\:00.1/driver/unbind
#echo 0003:09:00.2 > /sys/bus/pci/devices/0003\:09\:00.2/driver/unbind
#echo 0003:09:00.3 > /sys/bus/pci/devices/0003\:09\:00.3/driver/unbind
#echo 0003:09:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
#echo 0003:09:00.1 > /sys/bus/pci/drivers/vfio-pci/bind
#echo 0003:09:00.2 > /sys/bus/pci/drivers/vfio-pci/bind
#echo 0003:09:00.3 > /sys/bus/pci/drivers/vfio-pci/bind

2. Boot up guest with cmd:
#qemu-kvm ... \
    -device spapr-pci-vfio-host-bridge,id=vfiohost,index=0x1,iommu=2 \
    -device vfio-pci,host=0003:09:00.0,bus=vfiohost.0,addr=0x1,id=vfio_dev \
... \

3.In guest, Got an IP for vfio-device(enP1p0s1) by dhcp and do scp action.

[root@localhost ~]# ifconfig
ifconfig
enP1p0s1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.106.4  netmask 255.255.255.0  broadcast 10.19.106.255
        inet6 fe80::9abe:94ff:fe01:754c  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:136a:9abe:94ff:fe01:754c  prefixlen 64  scopeid 0x0<global>
        ether 98:be:94:01:75:4c  txqueuelen 1000  (Ethernet)
        RX packets 150711  bytes 10900709 (10.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 765691  bytes 1104467243 (1.0 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 19  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 60  bytes 5484 (5.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 60  bytes 5484 (5.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


4. scp file to an external host.
[root@localhost ~]# scp test root.67.19:/root/test_home/liuzt/
scp test root.67.19:/root/test_home/liuzt/
root.67.19's password: redhat

test                                          100% 1000MB  71.4MB/s   00:14    


I am confused why the NIC in the wrong group works.

Comment 4 Gil Klein 2015-09-10 07:58:38 UTC
I'm raising the priority and setting this one as a blocker cause it is blocking the verification of RHEV RFE #825045

Comment 6 David Gibson 2015-09-11 00:58:17 UTC
Zhengtong,

I am completely mystified as to how the NIC is working in that setup.  However, I think this is a distraction from actually verifying the bug.

I think the case to test is whether a VFIO device will work on an spapr-pci-host-bridge (as opposed to spapr-vfio-pci-host-bridge).  That's the case we really care about in applying this change.

Comment 7 Zhengtong 2015-09-11 05:13:42 UTC
Hi David,

As stated on the comment#21 of bug1250326 , While I use spapr-pci-host-bridge, the nic can't get IP by DHCP. So I think some functions of vfio device may be blocked.

Comment 8 David Gibson 2015-09-13 23:57:16 UTC
That's right, without the patches from this bug, VFIO devices won't be able to fully work on the spapr-pci-host-bridge.

Comment 9 David Gibson 2015-09-16 07:30:38 UTC
Unfortunately the patches I posted really aren't ready to go.  Moving back to ASSIGNED until I figure out what to do next.

Comment 10 David Gibson 2015-09-18 02:34:09 UTC
Ok, I've reworked this to address the concerns that Alex W and I had.  I have a test build at https://brewweb.devel.redhat.com/taskinfo?taskID=9852214

Comment 11 David Gibson 2015-09-21 05:22:55 UTC
Rebased my parkport on the latest downstream.  New test build at http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9858618

Comment 14 Miroslav Rezanina 2015-10-08 07:53:26 UTC
Fix included in qemu-kvm-rhev-2.3.0-29.el7

Comment 16 Zhengtong 2015-10-12 10:56:59 UTC
Tested with one guest  attached with an enulated NIC and a VFIO NIC on the same spapr-pci-host-bridge bus. VFIO NIC won't work with qemu-kvm-rhev-2.3.0-28, and works well with qemu-kvm-rhev-2.3.0-29. And this bug could be marked verified.  

details:
 reproduce with qemu-kvm-rhev-2.3.0-28
====================================================================
 1. Boot guest with  emulated NIC&VFIO_NIC
 #/usr/libexec/qemu-kvm \
 ...
    -device spapr-pci-host-bridge,id=vfiohost,index=0x1 \
    -device vfio-pci,host=0003:09:00.0,bus=vfiohost.0,addr=0x1,id=vfio_dev \
    -netdev tap,id=hostnet0,script=/etc/qemu-ifup \
    -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:85,bus=vfiohost.0,addr=0x2 \
 ...
 2. In guest:
 # dhclient
 # ifconfig
enP1p0s1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 40:f2:e9:5d:ab:84  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 20  

enP1p0s2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.34  netmask 255.255.255.0  broadcast 192.168.122.255
        inet6 fe80::5054:ff:fec4:e785  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:c4:e7:85  txqueuelen 1000  (Ethernet)
        RX packets 63  bytes 6418 (6.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 77  bytes 8062 (7.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 # [root@localhost ~]# ethtool enP1p0s1
 ...
 	Current message level: 0x000000ff (255)
			       drv probe link timer ifdown ifup rx_err tx_err
	Link detected: no

#### VFIO device can't get ip by dhclient. Link status is down by ethtool
====================================================================

 Verified with qemu-kvm-rhev-2.3.0.29
 1. Boot guest with  emulated NIC&VFIO_NIC
 #/usr/libexec/qemu-kvm \
 ...
    -device spapr-pci-host-bridge,id=vfiohost,index=0x1 \
    -device vfio-pci,host=0003:09:00.0,bus=vfiohost.0,addr=0x1,id=vfio_dev \
    -netdev tap,id=hostnet0,script=/etc/qemu-ifup \
    -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:85,bus=vfiohost.0,addr=0x2 \
 ...
 2. In guest:
  #dhclient
  #ifconfig
enP1p0s1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.112.8  netmask 255.255.248.0  broadcast 10.19.119.255
        inet6 fe80::42f2:e9ff:fe5d:ab84  prefixlen 64  scopeid 0x20<link>
        ether 40:f2:e9:5d:ab:84  txqueuelen 1000  (Ethernet)
        RX packets 57  bytes 5972 (5.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 2836 (2.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 20  

enP1p0s2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.34  netmask 255.255.255.0  broadcast 192.168.122.255
        inet6 fe80::5054:ff:fec4:e785  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:c4:e7:85  txqueuelen 1000  (Ethernet)
        RX packets 60  bytes 5857 (5.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 61  bytes 6389 (6.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 #ethtool enP1p0s1
...        Current message level: 0x000000ff (255)
                               drv probe link timer ifdown ifup rx_err tx_err
        Link detected: yes
 
 3. Ping from external host succeed:
[liuzt@localhost script]$ ping 10.19.112.8
PING 10.19.112.8 (10.19.112.8) 56(84) bytes of data.
64 bytes from 10.19.112.8: icmp_seq=1 ttl=55 time=392 ms
64 bytes from 10.19.112.8: icmp_seq=2 ttl=55 time=405 ms
64 bytes from 10.19.112.8: icmp_seq=3 ttl=55 time=409 ms

[root@ibm-p8-rhevm-16 ~]# ping 192.168.122.34 -c 5
PING 192.168.122.34 (192.168.122.34) 56(84) bytes of data.
64 bytes from 192.168.122.34: icmp_seq=1 ttl=64 time=0.146 ms
64 bytes from 192.168.122.34: icmp_seq=2 ttl=64 time=0.102 ms
64 bytes from 192.168.122.34: icmp_seq=3 ttl=64 time=0.099 ms
64 bytes from 192.168.122.34: icmp_seq=4 ttl=64 time=0.070 ms
64 bytes from 192.168.122.34: icmp_seq=5 ttl=64 time=0.069 ms

#### Both VFIO NIC and emulated NIC works well.

=====================================================================

Comment 18 errata-xmlrpc 2015-12-04 16:55:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html