Bug 1364035

Summary: [ppc64le][VFIO]Qemu complains:vfio_dma_map(0x10033d3a980, 0x1f34f0000, 0x10000, 0x3fff9a6d0000) = -6 (No such device or address)
Product: Red Hat Enterprise Linux 7 Reporter: Zhengtong <zhengtli>
Component: qemu-kvm-rhevAssignee: David Gibson <dgibson>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: hannsj_uhl, knoel, lagarcia, michen, qzhang, virt-maint, xuhan, zhengtli
Target Milestone: rcKeywords: Regression
Target Release: 7.3   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.6.0-22.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-07 21:28:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1213667    
Bug Blocks: 1288337, 1359843    

Description Zhengtong 2016-08-04 10:56:50 UTC
Description of problem:
Guest will keep complaining 
"""
vfio_dma_map(0x10033d3a980, 0x1f34c0000, 0x10000, 0x3fff9a6a0000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
vfio_dma_map(0x10033d3a980, 0x1f34d0000, 0x10000, 0x3fff9a6b0000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
vfio_dma_map(0x10033d3a980, 0x1f34e0000, 0x10000, 0x3fff9a6c0000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
vfio_dma_map(0x10033d3a980, 0x1f34f0000, 0x10000, 0x3fff9a6d0000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
vfio_dma_map(0x10033d3a980, 0x1f3500000, 0x10000, 0x3fff9a6e0000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
vfio_dma_map(0x10033d3a980, 0x1f3510000, 0x10000, 0x3fff9a6f0000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
vfio_dma_map(0x10033d3a980, 0x1f3520000, 0x10000, 0x3fff9a700000) = -6 (No such device or address)
...
"""
for a long time (5-10 seconds) while I do hot plug action with VFIO device which I hot unplugged before...

Version-Release number of selected component (if applicable):
Host kernel:3.10.0-481.el7.ppc64le
qemu-kvm-rhev-2.6.0-17.el7
Guest kernel:3.10.0-482.el7.ppc64le


How reproducible:
3/3

Steps to Reproduce:
1.Boot guest with the vfio device
/usr/libexec/qemu-kvm \
...
-device vfio-pci,host=0003:09:00.0,id=pf1 \
...

2.After guest boot up. check the device and hot unplug the vfio device:
[root@ibm-p8-rhevm-11 ~]# ping 10.16.67.19 -c 5
ping 10.16.67.19 -c 5
PING 10.16.67.19 (10.16.67.19) 56(84) bytes of data.
64 bytes from 10.16.67.19: icmp_seq=1 ttl=61 time=0.277 ms
64 bytes from 10.16.67.19: icmp_seq=2 ttl=61 time=0.182 ms
64 bytes from 10.16.67.19: icmp_seq=3 ttl=61 time=0.198 ms
64 bytes from 10.16.67.19: icmp_seq=4 ttl=61 time=0.211 ms
64 bytes from 10.16.67.19: icmp_seq=5 ttl=61 time=0.202 ms

{"execute":"device_del","arguments":{"id":"pf1"}}
{"return": {}}


3.Check device info:

(qemu) info pci
  Bus  0, device   0, function 0:
    VGA controller: PCI device 1234:1111
      BAR0: 32 bit prefetchable memory at 0x80000000 [0x80ffffff].
      BAR2: 32 bit memory at 0xc0100000 [0xc0100fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id ""
  Bus  0, device   3, function 0:
    USB controller: PCI device 106b:003f
      IRQ 0.
      BAR0: 32 bit memory at 0xc0001000 [0xc00010ff].
      id "usb1"
  Bus  0, device   4, function 0:
    SCSI controller: PCI device 1af4:1004
      IRQ 0.
      BAR0: I/O at 0x0040 [0x007f].
      BAR1: 32 bit memory at 0xc0000000 [0xc0000fff].
      id "virtio_scsi_pci0"

4. Hot plug the same vfio device:
{"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0003:09:00.0","id":"pf1"}}

****************And at this step: qemu give the error msg** 

5.Check device info :
(qemu) info pci
  Bus  0, device   0, function 0:
    VGA controller: PCI device 1234:1111
      BAR0: 32 bit prefetchable memory at 0x80000000 [0x80ffffff].
      BAR2: 32 bit memory at 0xc0100000 [0xc0100fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id ""
  Bus  0, device   1, function 0:
    Ethernet controller: PCI device 14e4:1657
      IRQ 0.
      BAR0: 64 bit prefetchable memory at 0x100000000 [0x10000ffff].
      BAR2: 64 bit prefetchable memory at 0x100010000 [0x10001ffff].
      BAR4: 64 bit prefetchable memory at 0x100020000 [0x10002ffff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0007fffe].
      id "pf1"
  Bus  0, device   3, function 0:
    USB controller: PCI device 106b:003f
      IRQ 0.
      BAR0: 32 bit memory at 0xc0001000 [0xc00010ff].
      id "usb1"
  Bus  0, device   4, function 0:
    SCSI controller: PCI device 1af4:1004
      IRQ 0.
      BAR0: I/O at 0x0040 [0x007f].
      BAR1: 32 bit memory at 0xc0000000 [0xc0000fff].
      id "virtio_scsi_pci0"

6. Check the network of guest:
[root@ibm-p8-rhevm-11 ~]# ifconfig
eth0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 40:f2:e9:5d:ef:30  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 19  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Local Loopback)
        RX packets 112  bytes 10304 (10.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 112  bytes 10304 (10.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

7. # dhclient eth0

Can't get IP by dhclient. So can't ping outside...

Actual results:
Qemu complains error msg , and the network device don't work anymore.

Expected results:
Qemu don't raise error , and guest vfio device works well

Additional info:
guest cmd:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pseries  \
    -nodefaults  \
    -vga std  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_Mmc1en/monitor-qmpmonitor1-20160802-231736-WMlxeI9x,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_Mmc1en/monitor-catch_monitor-20160802-231736-WMlxeI9x,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_Mmc1en/serial-serial0-20160802-231736-WMlxeI9x,server,nowait \
    -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
    -device pci-ohci,id=usb1,bus=pci.0,addr=03 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=04,disable-legacy=off,disable-modern=on \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/RHEL-Server-7.3-ppc64le-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -m 8192  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -device usb-kbd \
    -device usb-mouse \
    -device vfio-pci,host=0003:09:00.0,id=pf1 \
    -vnc :0  \
    -rtc base=utc,clock=host  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -qmp tcp:0:4445,server,nowait \
    -enable-kvm \
    -monitor stdio

Comment 1 Zhengtong 2016-08-04 11:08:12 UTC
x86 guys have helping me to check out that this issue can't be reproduce on x86 platform.


I have checked that this issue can't happened on qemu-kvm-rhev-2.6.0-14.el7, and start to happen from version qemu-kvm-rhev-2.6.0-15.el7. So this bug is a regression bug.

Comment 4 David Gibson 2016-08-11 01:19:46 UTC
What sort of device was passed through with VFIO?  Have you tried this with multiple types of device or just one?

Please include the output from lspci on the host.

This could be a bug in qemu, but it could also be a missing quirk specific to the device in the host kernel.

Comment 5 Zhengtong 2016-08-11 02:48:05 UTC
The device I passed to guest is BCM5719.   In my previous test, I only tested with this device.  I tried with USB controller just now, I hit the same problem.
qemu give the error msg, too 

"
...
VFIO_MAP_DMA: -6
vfio_dma_map(0x10017a7c900, 0x1fda70000, 0x10000, 0x3fffa8080000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
vfio_dma_map(0x10017a7c900, 0x1fda80000, 0x10000, 0x3fffa8090000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
...
" 




lspci info in my host:
[root@ibm-p8-rhevm-11 ~]# lspci
0000:00:00.0 PCI bridge: IBM Device 03dc
0001:00:00.0 PCI bridge: IBM Device 03dc
0001:01:00.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0001:02:01.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0001:02:08.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0001:02:09.0 PCI bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
0002:00:00.0 PCI bridge: IBM Device 03dc
0002:01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:10.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:11.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:00:00.0 PCI bridge: IBM Device 03dc
0005:00:00.0 PCI bridge: IBM Device 03dc
0006:00:00.0 PCI bridge: IBM Device 03dc

Comment 6 David Gibson 2016-08-18 05:31:24 UTC
I've confirmed I can reproduce this with the latest downstream qemu, but not with upstream qemu.

I notice something a little different in the symptoms.  The first patch of vfio_dma_maps() fails with ENXIO (errno -6)

VFIO_MAP_DMA: -6
vfio_dma_map(0x10020acaf80, 0x40000000, 0x10000, 0x3fff3d660000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
vfio_dma_map(0x10020acaf80, 0x40010000, 0x10000, 0x3fff3d670000) = -6 (No such device or address)

However, later ones fail with EBUSY (errno -11)

VFIO_MAP_DMA: -6
vfio_dma_map(0x10020acaf80, 0x403c0000, 0x10000, 0x3fff3da20000) = -6 (No such device or address)
VFIO_MAP_DMA: -6
vfio_dma_map(0x10020acaf80, 0x403d0000, 0x10000, 0x3fff3da30000) = -11 (Resource temporarily unavailable)


I think the "VFIO_MAP_DMA: -6" is reporting an ENXIO from the unmap when the qemu code attempts to roll back and remap on EBUSY.

Comment 7 David Gibson 2016-08-18 05:56:24 UTC
I've tracked this down to missing upstream commit d78c19b5cf4821d0c198f4132a085bdbf19dda4c

Working brew build at: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11620440

Comment 8 Miroslav Rezanina 2016-08-22 18:26:51 UTC
Fix included in qemu-kvm-rhev-2.6.0-22.el7

Comment 10 Zhengtong 2016-08-23 08:45:28 UTC
Tested with fixed version qemu-kvm-rhev-2.6.0-22.el7 , the result is good.

Boot up guest with cmd as in comment #c1 

1. In guest,can ping outside successfully
2. hot unplug the vfio device(BCM 5719):
{"execute":"device_del","arguments":{"id":"pf1"}}
...
In the guest, there is no target device. 
[root@ibm-p8-rhevm-13 ~]# ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Local Loopback)
        RX packets 76  bytes 6824 (6.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 76  bytes 6824 (6.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

3. hot plug the vfio device again
{"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0003:09:00.0","id":"pf1"}}
...


4.Check the output of qemu monitor:
(qemu)
(qemu)..

no any error msg raise up.

5. In guest , check the network device
[root@ibm-p8-rhevm-13 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 40:f2:e9:5d:9c:a8  txqueuelen 1000  (Ethernet)
        RX packets 91  bytes 7938 (7.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 19  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1  (Local Loopback)
        RX packets 76  bytes 6824 (6.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 76  bytes 6824 (6.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

The device shows up.

6. in guest , got the device an ip by dhclient and ping outside.

[root@ibm-p8-rhevm-13 ~]# ping 10.66.10.172 -c 5

PING 10.66.10.172 (10.66.10.172) 56(84) bytes of data.
64 bytes from 10.66.10.172: icmp_seq=1 ttl=54 time=308 ms
64 bytes from 10.66.10.172: icmp_seq=2 ttl=54 time=308 ms
64 bytes from 10.66.10.172: icmp_seq=3 ttl=54 time=310 ms
64 bytes from 10.66.10.172: icmp_seq=4 ttl=54 time=309 ms
64 bytes from 10.66.10.172: icmp_seq=5 ttl=54 time=309 ms


So , the bug is fixed.

Comment 12 errata-xmlrpc 2016-11-07 21:28:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html