Bug 1250326 - vfio device can't be hot unplugged on powerpc guest
vfio device can't be hot unplugged on powerpc guest
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: SLOF (Show other bugs)
7.2
ppc64le Linux
unspecified Severity medium
: rc
: ---
Assigned To: Laurent Vivier
Virtualization Bugs
:
Depends On:
Blocks: RHEV3.6PPC 1263795 1277183 1277184
  Show dependency treegraph
 
Reported: 2015-08-05 03:37 EDT by Zhengtong
Modified: 2016-02-21 06:08 EST (History)
11 users (show)

See Also:
Fixed In Version: SLOF-20150313-5.gitc89b0df.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1263795 (view as bug list)
Environment:
Last Closed: 2015-11-19 04:20:53 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Zhengtong 2015-08-05 03:37:07 EDT
Description of problem:

The vfio device can't be hot unplugged 

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-2.3.0-13.el7
Host:3.10.0-302.el7.ppc64le
Guest:3.10.0-229.12.1.el7.ppc64

How reproducible:
3/3

Steps to Reproduce:
1.Boot guest with vfio device:
/usr/libexec/qemu-kvm \
    -name vfio-liuzt-BE \
    -machine pseries,accel=kvm,usb=off \
    -m 4096 \
    -realtime mlock=off \
    -smp 4,sockets=2,cores=2,threads=1 \
    -uuid 5125cf27-4b01-4493-b46d-734d08becc6b \
    -no-user-config \
    -no-shutdown \
    -nodefaults \
    -chardev socket,id=charmonitor,path=monitor,server,nowait \
    -mon chardev=charmonitor,id=monitor,mode=control \
    -rtc base=utc \
    -boot strict=on \
    -device pci-ohci,id=usb,bus=pci.0,addr=0x1 \
    -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 \
    -drive file=/root/test_home/liuzt/vdisk/BE_guest.img,if=none,id=drive-scsi0-0-0-0,format=qcow2 \
    -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
    -drive file=/root/test_home/liuzt/vdisk/RHEL-7.1-20150219.1-Server-ppc64-dvd1.iso,if=none,id=cd_rom,media=cdrom,format=raw \
    -device scsi-cd,bus=scsi0.0,drive=cd_rom,id=cd_rom1,bootindex=2 \
    -netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,ifname=vnetvfio \
    -device spapr-vlan,netdev=hostnet0,id=net0,mac=52:54:00:5d:c5:9e,reg=0x2000 \
    -chardev socket,id=charserial0,path=serial,server,nowait \
    -device spapr-vty,chardev=charserial0,reg=0x30000000 \
    -device spapr-pci-vfio-host-bridge,id=vfiohost,iommu=1,index=0x1 \
    -device vfio-pci,host=0002:01:00.3,bus=vfiohost.0,addr=0x1,id=vfio_dev \
    -device usb-kbd,id=input0 \
    -device usb-mouse,id=input1 \
    -vnc 0.0.0.0:16 \
    -k en-us \
    -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x3 \
    -global spapr-nvram.reg=0x3000 \
    -monitor stdio

2.After guest boot up , hot unplug vfio device
(qemu) device_del vfio_dev

3.Check the device in hmp and guest
(qemu) info pci
  Bus  0, device   1, function 0:
    Ethernet controller: PCI device 10df:e220
      IRQ 0.
      BAR0: 64 bit prefetchable memory at 0x100000000 [0x100007fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0007fffe].
      id "vfio_dev"
  Bus  0, device   1, function 0:
    USB controller: PCI device 106b:003f
      IRQ 0.
      BAR0: 32 bit memory at 0xc0000000 [0xc00000ff].
      id "usb"
  Bus  0, device   2, function 0:
    SCSI controller: PCI device 1af4:1004
      IRQ 0.
      BAR0: I/O at 0x0040 [0x007f].
      BAR1: 32 bit memory at 0xc0001000 [0xc0001fff].
      id "scsi0"
  Bus  0, device   3, function 0:
    VGA controller: PCI device 1234:1111
      BAR0: 32 bit prefetchable memory at 0x80000000 [0x80ffffff].
      BAR2: 32 bit memory at 0xc0002000 [0xc0002fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id "video0"

In guest:
[root@dhcp71-17 ~]# ifconfig -a
enP1p0s1: flags=4098<BROADCAST,MULTICAST>  mtu 1500
        inet 10.16.71.115  netmask 255.255.248.0  broadcast 10.16.71.255
        ether 00:90:fa:74:02:88  txqueuelen 1000  (Ethernet)
        RX packets 3303  bytes 343504 (335.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 88  bytes 11477 (11.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.16.71.17  netmask 255.255.248.0  broadcast 10.16.71.255
        inet6 2620:52:0:1040:5054:ff:fe5d:c59e  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::5054:ff:fe5d:c59e  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:5d:c5:9e  txqueuelen 1000  (Ethernet)
        RX packets 8600  bytes 882699 (862.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 205  bytes 28343 (27.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 21  


Actual results:
The device still existed in guest and hmp: info pci

Expected results:
The device is removed , should not be appear in guest and hmp cli.

Additional info:
Comment 2 Laurent Vivier 2015-08-28 10:28:49 EDT
Could you check if qemu-kvm-rhev-2.3.0-20.el7 fixes this issue ?
Comment 3 Zhengtong 2015-08-30 23:48:31 EDT
(In reply to Laurent Vivier from comment #2)
> Could you check if qemu-kvm-rhev-2.3.0-20.el7 fixes this issue ?

Hi Laurent,
Tried with qemu-kvm-rhev-2.3.0-20.el7. The problem still existed. didn't make any different.
Comment 5 Laurent Vivier 2015-09-01 11:11:36 EDT
Could you check if the rtas_errd daemon is running in the guest ?
Are there some error messages reported by journalctl or dmesg ?
Comment 6 Zhengtong 2015-09-01 22:25:44 EDT
(In reply to Laurent Vivier from comment #5)
> Could you check if the rtas_errd daemon is running in the guest ?
> Are there some error messages reported by journalctl or dmesg ?

rtas_errd daemon is running in the guest.
[root@dhcp71-32 ~]# ps aux | grep rtas
root       868  0.0  0.1   5568  4352 ?        Ss   10:09   0:00 /usr/sbin/rtas_errd
root      4014  0.0  0.0 111360  3264 pts/1    S+   10:16   0:00 grep --color=auto rtas


And I didn't find any error msg related to vfio hot plug in dmesg or journalctl.

dmesg output made no any difference b/w before and after hotplug action , and neither for journalctl
Comment 7 Laurent Vivier 2015-09-07 10:54:19 EDT
As advised by David, I have checked with an available USB3 PCI card on ibm-p8-virt-01.

host kernel : 3.10.0-314.el7.ppc64le
guest kernel: 3.10.0-306.0.1.el7.ppc64le
qemu: qemu-kvm-rhev-2.3.0-22.el7.ppc64le

The virt-manager/virt-install generated qemu command line differs from yours, the vfio card is plugged directly in the guest PCI bus, and it works well:

  -device vfio-pci,host=0006:01:00.0,id=hostdev0,bus=pci.0,addr=0x4

and in this case it works well (hotplug+hotunplug -> OK, coldplud+hotunplug -> OK).

If I use a spapr-pci-vfio-host-bridge like you are, the "device_del" in qemu console generates absolutely nothing...

  -device spapr-pci-vfio-host-bridge,id=vfiohost,iommu=5,index=0x1
  -device vfio-pci,host=0006:01:00.0,bus=vfiohost.0,addr=0x1,id=vfio_dev

But hotplug/unplug seems to not be implemented in spapr-pci-vfio-host-bridge.c, only in spapr-pci.c.

I think you should not use spapr-pci-vfio-host-bridge.

David, what is your opinion?
Comment 8 David Gibson 2015-09-07 21:04:08 EDT
With the current downstream qemu, vfio devices definitely can't work on the normal guest host bridge.  I think the reason it's appearing to work is that the problems will only appear once you start attempting DMA, which I don't imagine you've been able to do with nothing actually connected to the xhci.

I hadn't realised it would allow you to connect the device despite the fact that it can't fully work.  That fact is another reason to press forward with bug 1259556, which will allow vfio devices and emulated devices to share a standard guest host bridge.

Zhengtong, can you please retry with the qemu version at http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9803236 which contains a draft port of the patches required for this.
Comment 9 Zhengtong 2015-09-07 22:28:47 EDT
(In reply to David Gibson from comment #8)
> With the current downstream qemu, vfio devices definitely can't work on the
> normal guest host bridge.  I think the reason it's appearing to work is that
> the problems will only appear once you start attempting DMA, which I don't
> imagine you've been able to do with nothing actually connected to the xhci.
> 
> I hadn't realised it would allow you to connect the device despite the fact
> that it can't fully work.  That fact is another reason to press forward with
> bug 1259556, which will allow vfio devices and emulated devices to share a
> standard guest host bridge.
> 
> Zhengtong, can you please retry with the qemu version at
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9803236 which contains
> a draft port of the patches required for this.

David, unfortunately, the draft ported patched seems no help with the problem.

[root@ibm-p8-rhevm-12 test]# /usr/libexec/qemu-kvm --version
QEMU emulator version 2.3.0 (qemu-kvm-rhev-2.3.0-22.el7.test), Copyright (c) 2003-2008 Fabrice Bellard


details:

(qemu) device_del vfio_dev 
(qemu) info pci
  Bus  0, device   1, function 0:
    Ethernet controller: PCI device 14e4:1657
      IRQ 0.
      BAR0: 64 bit prefetchable memory at 0x100000000 [0x10000ffff].
      BAR2: 64 bit prefetchable memory at 0x100010000 [0x10001ffff].
      BAR4: 64 bit prefetchable memory at 0x100020000 [0x10002ffff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0007fffe].
      id "vfio_dev"
  Bus  0, device   1, function 0:
    USB controller: PCI device 106b:003f
...


and in guest:
[root@localhost ~]# ip link
ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
3: enP1p0s1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 98:be:94:01:75:4c brd ff:ff:ff:ff:ff:ff


tip: "enP1p0s1" is the vfio device.
Comment 10 David Gibson 2015-09-08 00:26:58 EDT
Ok, thanks for testing.

Since Laurent wasn't able to reproduce with the xhci, I'm going to guess that this bug only triggers on certain vfio host devices.  It looks like the device you're using is a Broadcom ethernet function.  We do have one of those on our test machine, so we can attempt to reproduce with that.

Couple of questions that might be relevant:
   * On your test system, is the host using one of the other PCI functions of the same Broadcom device?
   * Is any actual network connection plugged into the vfio network device?
      * If so, is the guest obtaining an IP address on the device?
Comment 11 Zhengtong 2015-09-08 05:34:39 EDT
(In reply to David Gibson from comment #10)
> Ok, thanks for testing.
> 
> Since Laurent wasn't able to reproduce with the xhci, I'm going to guess
> that this bug only triggers on certain vfio host devices.  It looks like the
> device you're using is a Broadcom ethernet function.  We do have one of
> those on our test machine, so we can attempt to reproduce with that.
> 
> Couple of questions that might be relevant:
>    * On your test system, is the host using one of the other PCI functions
> of the same Broadcom device?

No, I unbind all the functions of the Devices from host.and bind to vfio.

On host:
#lspci
...
0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
...

And send command to host:
#echo 0003:09:00.0 > /sys/bus/pci/devices/0003\:09\:00.0/driver/unbind
#echo 0003:09:00.1 > /sys/bus/pci/devices/0003\:09\:00.1/driver/unbind
#echo 0003:09:00.2 > /sys/bus/pci/devices/0003\:09\:00.2/driver/unbind
#echo 0003:09:00.3 > /sys/bus/pci/devices/0003\:09\:00.3/driver/unbind
#echo 0003:09:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
#echo 0003:09:00.1 > /sys/bus/pci/drivers/vfio-pci/bind
#echo 0003:09:00.2 > /sys/bus/pci/drivers/vfio-pci/bind
#echo 0003:09:00.3 > /sys/bus/pci/drivers/vfio-pci/bind


>    * Is any actual network connection plugged into the vfio network device?
>       * If so, is the guest obtaining an IP address on the device?

Yes , The function (0003:09:00.1) is cabled . I also got an IP to it by dhclient and ping out success.
Comment 12 Laurent Vivier 2015-09-08 05:59:11 EDT
(In reply to David Gibson from comment #10)
> Ok, thanks for testing.
> 
> Since Laurent wasn't able to reproduce with the xhci, I'm going to guess

I'm able to reproduce it with spapr-pci-vfio-host-bridge (cannot unplug), but it works with standard PCI bridge (can unplug), but as you said, I didn't test DMA.

> (In reply to Zhengtong from comment #9)
> (In reply to David Gibson from comment #8)
> > 
> > Zhengtong, can you please retry with the qemu version at
> > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=9803236 which contains
> > a draft port of the patches required for this.
> 
> David, unfortunately, the draft ported patched seems no help with the
> problem.

I've tested the RPM David has provided and the result is the same.

BUT what I don't understand is if these patches allows:

1- to have DMA on standard guest bridge with VFIO
2- to unplug cards on spapr-pci-vfio-host-bridge

I think Zhengtong has tested [2], and I think it cannot work as I didn't find in  papr/spapr-dev code to unplug cards on spapr-pci-vfio-host-bridge. Is it shared with spapr-pci-host-bridge ???
Comment 13 Laurent Vivier 2015-09-08 07:48:41 EDT
With the package provide by David, I've been able to use a VFIO ethernet card in my guest with standard guest PCI bus (no spapr-pci-vfio-host-bridge parameter) and to unplug it.  I've been able to connect through ssh directly to my guest from my laptop. It wasn't able to dhcp on the interface with the official build.
Comment 14 David Gibson 2015-09-08 23:24:34 EDT
Hm, so Zhengtong is still seeing the bug, but Laurent is not, with what appears to be the same hardware passed through.  I can't see any obvious difference between the situations which might cause this.

Could both of you please confirm which host and guest kernel versions and which qemu version you're using?

Zhengtong,

Does using the spapr-pci-host-bridge or spapr-pci-vfio-host-bridge device make any difference?  With the qemu I linked to, they should just be aliases to the same device.

Laurent,

I believe spapr-pci-vfio-host-bridge was always supposed to support hotplug, inheriting the relevant methods from spapr-pci-host-bridge.  However, because it was bound to a specific IOMMU group, which wasn't known until the device was present, I think there may have been bugs with its hotplug handling.  With the unified handling spapr-pci-vfio-host-bridge should be essentially just an alias to spapr-pci-host-bridge.  Does that help clarify?
Comment 16 Laurent Vivier 2015-09-10 11:09:49 EDT
(In reply to David Gibson from comment #14)
> Could both of you please confirm which host and guest kernel versions and 
> which qemu version you're using?

kernel version: 3.10.0-315.el7.ppc64le
qemu version: the one you've provided.

> Laurent,
> 
> I believe spapr-pci-vfio-host-bridge was always supposed to support hotplug,
> inheriting the relevant methods from spapr-pci-host-bridge.  However,
> because it was bound to a specific IOMMU group, which wasn't known until the
> device was present, I think there may have been bugs with its hotplug
> handling.  With the unified handling spapr-pci-vfio-host-bridge should be
> essentially just an alias to spapr-pci-host-bridge.  Does that help clarify?

Thank you for the details.

You're right, there are some bugs in hotplug handling:

1- the bug can be reproduced without VFIO interface, but using the spapr-pci-vfio-host-bridge with a virtual PCI device, for instance:

    -device spapr-pci-vfio-host-bridge,id=vfiohost,index=0x1 \
    -device virtio-net-pci,bus=vfiohost.0,id=net1 \

2- the bug appears only on coldplugged devices, not on hotplugged ones

So a hotplugged device on spapr-pci-vfio-host-bridge can be unplugged.

I've added some traces, and it appears the device is not unplugged, because RTAS interface receives different events sequence:

A- working case (hotplugged):

    rtas_set_indicator(type=RTAS_SENSOR_TYPE_DR, index=0x40010008, state=1)
    rtas_set_indicator(type=RTAS_SENSOR_TYPE_ISOLATION_STATE, index=0x40010008,
                       state=0)

B- faulty case (oldplugged):

    rtas_set_indicator(type=RTAS_SENSOR_TYPE_DR, index=0x40010000, state=1)
    rtas_set_indicator(type=RTAS_SENSOR_TYPE_DR, index=0x40010000, state=0)

In case [B], set_isolation_state() is not called, detach() neither, and thus the device in not unplugged.

I'm using:

qemu-kvm-rhev-2.3.0-22.el7 +
spapr_pci: Remove constraints about VFIO-PCI devices
spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge
spapr_iommu: Make spapr_tce_find_by_liobn() public
spapr_pci: Define default DMA window size as a macro
Comment 17 Laurent Vivier 2015-09-10 14:46:05 EDT
The problem seems to be in the device-tree:

# drmgr -c pci -r -s 0x40010008 -n
There is no configured card to remove from the specified PCI slot.

In fact, when the card is hotplugged in the standard pci bus, it appears under /proc/device-tree/pci@800000020000000 as pci@1

whereas when it is hotplugged in pci-vfio, it doesn't appear under /proc/device-tree/pci@800000020000001.

This is why drmgr is not able to unplug it.
Comment 18 Laurent Vivier 2015-09-10 15:22:32 EDT
(In reply to Laurent Vivier from comment #17)
> The problem seems to be in the device-tree:
> 
> # drmgr -c pci -r -s 0x40010008 -n
> There is no configured card to remove from the specified PCI slot.
> 
> In fact, when the card is hotplugged in the standard pci bus, it appears
> under /proc/device-tree/pci@800000020000000 as pci@1
> 
> whereas when it is hotplugged in pci-vfio, it doesn't appear under
> /proc/device-tree/pci@800000020000001.
> 
> This is why drmgr is not able to unplug it.

This is wrong: I have this error because I've already tried to remove the card.
The device tree entry is correctly created in all cases.
Comment 19 Laurent Vivier 2015-09-10 15:59:41 EDT
The device tree is not created in the case of _coldplugged_ device in spapr-pci-vfio-host-bridge. This is why drmgr cannot unplug it.
Comment 20 David Gibson 2015-09-10 20:51:57 EDT
So.. you're saying that the DT entry isn't created for coldplugged devices on spapr-pci-vfio-host-bridge (but not spapr-pci-host-bridge) even after the patches which are supposed to unify spapr-pci-host-bridge and spapr-pci-vfio-host-bridge?

If so, that's a bug in the patches we need to fix - spapr-pci-host-bridge and spapr-pci-vfio-host-bridge shoudl behave identically afterwards.

Zhengtong,

If you plug the devices onto an spapr-pci-host-bridge (and don't create any spapr-vfio-pci-host-bridge devices) can you reproduce the original problem with the brewed qemu?
Comment 21 Zhengtong 2015-09-11 01:10:06 EDT
(In reply to David Gibson from comment #20)
> So.. you're saying that the DT entry isn't created for coldplugged devices
> on spapr-pci-vfio-host-bridge (but not spapr-pci-host-bridge) even after the
> patches which are supposed to unify spapr-pci-host-bridge and
> spapr-pci-vfio-host-bridge?
> 
> If so, that's a bug in the patches we need to fix - spapr-pci-host-bridge
> and spapr-pci-vfio-host-bridge shoudl behave identically afterwards.
> 
> Zhengtong,
> 
> If you plug the devices onto an spapr-pci-host-bridge (and don't create any
> spapr-vfio-pci-host-bridge devices) can you reproduce the original problem
> with the brewed qemu?

Hi David, I tested wit spapr-pci-host-bridge as the bus for vfio device. Failed to unplugged, either

Details:

version:
qemu-kvm-rhev-2.3.0-22.el7

command:
..
    -device spapr-pci-host-bridge,id=vfiohost,index=0x1 \
    -device vfio-pci,host=0003:09:00.0,bus=vfiohost.0,addr=0x1,id=vfio_dev \
...

No change for "info pci" after "device_del vfio_dev" in hmp.



what's more, with spapr-pci-host-bridge. I can't get IP for the vfio nic device by dhcp. So, for the vfio device attached to this kind of bridge, the function may also has some problem.
Comment 22 Laurent Vivier 2015-09-11 03:12:20 EDT
(In reply to Zhengtong from comment #21)
> 
> what's more, with spapr-pci-host-bridge. I can't get IP for the vfio nic
> device by dhcp. So, for the vfio device attached to this kind of bridge, the
> function may also has some problem.

Do you test with the package provided by David or the standard one ?

With the one from David, you should be able to get an IP address and send/receive data.

With the standard package, unplug must be working, but remember you must have a rtas_errd daemon running, and wait at least 2 seconds between the "device_del" and the "info pci" to allow the daemon to switch off the card and Qemu to detach it from the bus.
Comment 23 Zhengtong 2015-09-11 07:21:51 EDT
HI,
The results at comment 21  indeed come out from startard qemu package qemu-kvm-rhev2.3.0-22 . 

About the package provided by David, I will try later.
Comment 24 Laurent Vivier 2015-09-11 08:47:27 EDT
For the second bus, the DRC index is not valid:

1st bus (spapr-pci-host-bridge):

# for file in /proc/device-tree/pci@800000020000000/*@?/ibm,my-drc-index; do
    echo $file;
    hexdump -C $file;
   done

/proc/device-tree/pci@800000020000000/communication-controller@2/ibm,my-drc-index
00000000  40 00 00 10                                       |@...|
00000004
/proc/device-tree/pci@800000020000000/unknown-legacy-device@3/ibm,my-drc-index
00000000  40 00 00 18                                       |@...|
00000004
/proc/device-tree/pci@800000020000000/usb@0/ibm,my-drc-index
00000000  40 00 00 00                                       |@...|
00000004

2nd bus (spapr-vfio-pci-host-bridge):

# for file in /proc/device-tree/pci@800000020000001/*@?/ibm,my-drc-index; do
    echo $file;
    hexdump -C $file;
  done

/proc/device-tree/pci@800000020000001/ethernet@0/ibm,my-drc-index
00000000  40 00 00 00                                       |@...|
00000004

It should be 0x40010000, as we can see with an hotplugged device:

/proc/device-tree/pci@800000020000001/pci@1/ibm,my-drc-index
00000000  40 01 00 08                                       |@...|
00000004

So when QEMU asks to remove the device 0x4000000, drmgr will not remove the device from the 2nd bus, but from the first but (first device with given DRC index) which is an USB PCI card, and not the ethernet PCI card we have chosen:

# drmgr -d 15 -c pci -s 0x40000000 -r -n
Enabling RTAS debug
drmgr: -d 15 -c pci -s 0x40000000 -r -n 
Retrieving hotplug nodes
[...]
0x40000000 =? 0x40000005
0x40000000 =? 0x40000004
0x40000000 =? 0x40000003
0x40000000 =? 0x40000002
0x40000000 =? 0x40000001
0x40000000 =? 0x40000000
Found drc_name C0
found node: drc name=C0, index=0x40000000, path=/proc/device-tree/pci@800000020000000
RTAS call args.token = 8218
RTAS call args.ninputs = 3
RTAS call args.nret = 1
RTAS call input[0] = 0x2a230000 (BE)
RTAS call input[1] = 0x40 (BE)
RTAS call input[2] = 0x1000000 (BE)
RTAS call output[0] = 0x0 (BE)
librtas sc_set_indicator(): (9002, 1073741824, 1) = 0
Removing device-tree node /proc/device-tree/pci@800000020000000/usb@0
is calling rtas_set_indicator(ISOLATE index 0x40000000)
RTAS call args.token = 8218
RTAS call args.ninputs = 3
RTAS call args.nret = 1
RTAS call input[0] = 0x29230000 (BE)
RTAS call input[1] = 0x40 (BE)
RTAS call input[2] = 0x0 (BE)
RTAS call output[0] = 0x0 (BE)
librtas sc_set_indicator(): (9001, 1073741824, 0) = 0
is calling set_power(POWER_OFF index 0x40000000, power_domain 0xffffffff)
RTAS call args.token = 8219
RTAS call args.ninputs = 2
RTAS call args.nret = 2
RTAS call input[0] = 0xffffffff (BE)
RTAS call input[1] = 0x0 (BE)
RTAS call output[0] = 0x0 (BE)
RTAS call output[1] = 0x64000000 (BE)
librtas sc_set_power_level(): (-1, 0, 0x3fffe4994f94) = 0, 100
########## Sep 11 04:42:13 2015 ##########
Comment 25 Laurent Vivier 2015-09-11 09:08:53 EDT
For coldplugged devices, device tree is built by SLOF.

Apparently, the PCI bridge number is hardcoded to "0x40000000":

slof/fs/pci-properties.fs:
...
            \ QEMU uses static assignments for my-drc-index:
            \ 40000000h + $bus << 8 + $slot << 3
            dup dup pci-addr2bus 8 lshift
            swap pci-addr2dev 3 lshift or
            40000000 + encode-int s" ibm,my-drc-index" property
...

So change the component to "SLOF".
Comment 28 David Gibson 2015-09-13 19:54:27 EDT
Zhengtong,

Regarding comment 21.  Not getting an IP when the device is on spapr-pci-host-bridge is expected with the old qemu version - that's because of bug 1259556, which I believe you're also trying to reproduce.
Comment 29 David Gibson 2015-09-13 19:56:07 EDT
Laurent, nice work tracking this to SLOF.  As I replied on the upstream thread, I don't think your first draft is the right fix, but that's definitely where the problem is.
Comment 30 Miroslav Rezanina 2015-09-16 10:49:03 EDT
Fix included in SLOF-20150313-4.gitc89b0df.el7
Comment 31 Miroslav Rezanina 2015-09-18 06:37:22 EDT
Fix included in SLOF-20150313-5.gitc89b0df.el7
Comment 33 Zhengtong 2015-09-21 23:22:22 EDT
Tested with SLOF-20150313-5.gitc89b0df.el7 and qemu-kvm-rhev2.3.0-24. Succeed in host unplug the vfio device.

Details:
1. Boot guest with vfio device:
/usr/libexec/qemu-kvm \
...
    -device spapr-pci-vfio-host-bridge,id=vfiohost,iommu=1,index=0x1 \
    -device vfio-pci,host=0003:09:00.0,bus=vfiohost.0,addr=0x1,id=vfio_dev \
...

2. Check device in hmp interface
(qemu) info pci
  Bus  0, device   1, function 0:
    Ethernet controller: PCI device 14e4:1657
      IRQ 0.
      BAR0: 64 bit prefetchable memory at 0x100000000 [0x10000ffff].
      BAR2: 64 bit prefetchable memory at 0x100010000 [0x10001ffff].
      BAR4: 64 bit prefetchable memory at 0x100020000 [0x10002ffff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0007fffe].
      id "vfio_dev"
  Bus  0, device   1, function 0:
    USB controller: PCI device 106b:003f
      IRQ 0.
      BAR0: 32 bit memory at 0xc0021000 [0xc00210ff].
      id "usb"
...

3. In guest: get an IP to the target device, and ping this IP from an external host

4. Hot unplug the device
(qemu) device_del vfio_dev 

5.Checked the result as follows:

 In guest: the device disappeared.
 [root@localhost ~]# ip link
ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00


In hmp interface. no this device after "info pci"
(qemu) info pci
 Bus  0, device   1, function 0:
    USB controller: PCI device 106b:003f
      IRQ 0.
      BAR0: 32 bit memory at 0xc0021000 [0xc00210ff].
      id "usb"
...


So,the bug could be marked verified.
Comment 35 errata-xmlrpc 2015-11-19 04:20:53 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2286.html

Note You need to log in before you can comment on or make changes to this bug.