Bug 1243721

Summary: After hotunpug virtio device, the device still exist in pci info
Product: Red Hat Enterprise Linux 7 Reporter: Zhengtong <zhengtli>
Component: qemu-kvm-rhevAssignee: Laurent Vivier <lvivier>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: amit.shah, hhuang, knoel, lvivier, michen, mrezanin, ngu, qzhang, virt-maint, xuhan, xuma, zhengtli
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.3.0-20.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-04 16:51:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zhengtong 2015-07-16 07:19:49 UTC
Description of problem:

After hot unplug virtio serial pci device , then do migration, this will cause destination qemu-kvm-rhev process down


Version-Release number of selected component (if applicable):
 
Host kernel:
3.10.0-292.el7.ppc64le

Guest: RHEL7.2

qemu-kvm-rhev:
qemu-kvm-rhev-2.3.0-9.el7


How reproducible:
4/4

Steps to Reproduce:
1.Boot up guest with virtio-serial-pci and virtseiriaport device on Host A:
/usr/libexec/qemu-kvm -name liuzt-RHEL-7.1-20150219.1_LE -machine pseries,accel=kvm,usb=off -m 32768 -realtime mlock=off -smp 64,sockets=1,cores=16,threads=4 \
-monitor stdio \
-monitor unix:tt,server,nowait \
-rtc base=localtime,clock=host \
-no-shutdown \
-boot strict=on \
-device usb-ehci,id=usb,bus=pci.0,addr=0x2 \
-device pci-ohci,id=usb1,bus=pci.0,addr=0x1 \
-device spapr-vscsi,id=scsi0,reg=0x1000 \
-drive file=/root/test_home/liuzt/vdisk/rhel_le.img,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
-serial pty \
-device usb-kbd,id=input0 \
-device usb-mouse,id=input1 \
-device usb-tablet,id=input2 \
-vnc 0:16 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x4 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \
-msg timestamp=on \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device spapr-vlan,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:83,reg=0x2000 \
-qmp tcp:0:4444,server,nowait \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-chardev socket,path=/root/test_home/liuzt/Manuall_test/virtio-serail/serial-socket1,id=channel0,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=channel0,name=org.linux-kvm.port.1,id=port1

2.hot unplug serial device
(qemu)device_del port1
(qemu)virtio-serial0

3.Boot up destination guest on Host B:
/usr/libexec/qemu-kvm -name liuzt-RHEL-7.1-20150219.1_LE -machine pseries,accel=kvm,usb=off -m 32768 -realtime mlock=off -smp 64,sockets=1,cores=16,threads=4 \
-monitor stdio \
-rtc base=localtime,clock=host \
-no-shutdown \
-boot strict=on \
-device usb-ehci,id=usb,bus=pci.0,addr=0x2 \
-device pci-ohci,id=usb1,bus=pci.0,addr=0x1 \
-device spapr-vscsi,id=scsi0,reg=0x1000 \
-drive file=/root/test_home/liuzt/vdisk/rhel_le.img,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
-serial pty \
-device usb-kbd,id=input0 \
-device usb-mouse,id=input1 \
-device usb-tablet,id=input2 \
-vnc 0:16 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x4 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \
-msg timestamp=on \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device spapr-vlan,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:83,reg=0x2000 \
-qmp tcp:0:4444,server,nowait \
-chardev socket,path=/root/test_home/liuzt/Manuall_test/virtio-serail/serial-socket1,id=channel0,server,nowait \
-incoming tcp:0:5980 \

4. Do migration
(qemu)migrate -d tcp:10.16.67.19:5980

Actual results:

migrate failed with error msg on destination:
(qemu) 2015-07-16T06:42:52.949559Z qemu-kvm: Unknown savevm section or instance 'pci@800000020000000:05.0/virtio-console' 0
2015-07-16T06:42:52.949732Z qemu-kvm: load of migration failed: Invalid argument


Expected results:
Migrate success without any problems


Additional info:

Comment 1 Zhengtong 2015-07-16 07:22:31 UTC
missed device_del in step 2:

step 2: 
2.hot unplug serial device
(qemu)device_del port1
(qemu)device_del virtio-serial0

Comment 3 Zhengtong 2015-07-21 05:58:36 UTC
This bug still existed while hot unplug virtio-blk-pci device
 
qemu-kvm 
...
-drive file=/root/test_home/liuzt/vdisk/test.img,format=qcow2,id=test30,if=none \
-device virtio-blk-pci,drive=test30,id=test3,bus=pci.0,addr=0x6 \
..

hot unplug the device :
{"execute":"device_del","arguments":{"id":"test3"}}

and do migration, then here is the errer in destination.

(qemu) 2015-07-21T05:53:31.067278Z qemu-kvm: Unknown savevm section or instance 'pci@800000020000000:06.0/virtio-blk' 0
2015-07-21T05:53:31.067444Z qemu-kvm: load of migration failed: Invalid argument

Comment 4 Zhengtong 2015-07-21 08:01:01 UTC
After hot unplug the device ,check the device again with info pci , the device still existed. This may be the root cause of the migration fail. So change the bug title. 


(qemu) device_del virtio-serial1
(qemu) info pci
  Bus  0, device   0, function 0:
    Class 1920: PCI device 1af4:1003
      IRQ 0.
      BAR0: I/O at 0x0020 [0x003f].
      BAR1: 32 bit memory at 0xc0000000 [0xc0000fff].
      id "virtio-serial0"
  Bus  0, device   1, function 0:
    USB controller: PCI device 106b:003f
      IRQ 0.
      BAR0: 32 bit memory at 0xc0001000 [0xc00010ff].
      id "usb1"
  Bus  0, device   2, function 0:
    USB controller: PCI device 8086:24cd
      IRQ 0.
      BAR0: 32 bit memory at 0xc0002000 [0xc0002fff].
      id "usb"
  Bus  0, device   3, function 0:
    Class 0255: PCI device 1af4:1002
      IRQ 0.
      BAR0: I/O at 0x0040 [0x005f].
      id "balloon0"
  Bus  0, device   4, function 0:
    VGA controller: PCI device 1234:1111
      BAR0: 32 bit prefetchable memory at 0x80000000 [0x80ffffff].
      BAR2: 32 bit memory at 0xc0003000 [0xc0003fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id "video0"
  Bus  0, device   5, function 0:
    Class 1920: PCI device 1af4:1003
      IRQ 0.
      BAR0: I/O at 0x0060 [0x007f].
      BAR1: 32 bit memory at 0xc0020000 [0xc0020fff].
      id "virtio-serial1"

Comment 5 Amit Shah 2015-07-22 14:04:56 UTC
Hot-unplug needs guest cooperation.  If the guest hasn't given up using the device, hot-unplug isn't deemed to have finished.  I don't see this as a bug.  If you believe otherwise, or have more data, please reopen.

Comment 6 Zhengtong 2015-07-23 08:55:24 UTC
(In reply to Amit Shah from comment #5)
> Hot-unplug needs guest cooperation.  If the guest hasn't given up using the
> device, hot-unplug isn't deemed to have finished.  I don't see this as a
> bug.  If you believe otherwise, or have more data, please reopen.

Hi, I think this bug need to be reopened.

Here is the addational result about virio device hot unplug.

Before and after unplugged virtio device, the result is not the same b/w guest system info and  qemu check info 

For example, 

Although qemu didn't response to device_del action by "info pci", in the guest, the device has disappear.
(qemu) device_del balloon0
(qemu) info pci
  Bus  0, device   1, function 0:
    USB controller: PCI device 106b:003f
      IRQ 0.
      BAR0: 32 bit memory at 0xc0000000 [0xc00000ff].
      id "usb1"
  Bus  0, device   2, function 0:
    USB controller: PCI device 8086:24cd
      IRQ 0.
      BAR0: 32 bit memory at 0xc0001000 [0xc0001fff].
      id "usb"
  Bus  0, device   3, function 0:
    Class 0255: PCI device 1af4:1002
      IRQ 0.
      BAR0: I/O at 0x0020 [0x003f].
      id "balloon0"
  Bus  0, device   4, function 0:
    VGA controller: PCI device 1234:1111
      BAR0: 32 bit prefetchable memory at 0x80000000 [0x80ffffff].
      BAR2: 32 bit memory at 0xc0002000 [0xc0002fff].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
      id "video0"
  Bus  0, device   5, function 0:
    SCSI controller: PCI device 1af4:1001
      IRQ 0.
      BAR0: I/O at 0x0040 [0x007f].
      BAR1: 32 bit memory at 0xc0020000 [0xc0020fff].
      id "scsi0-0-0-0"


In the guest:

Before unplug:
[root@dhcp71-167 ~]# lspci
00:01.0 USB controller: Apple Inc. KeyLargo/Intrepid USB
00:02.0 USB controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 10)
00:03.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
00:04.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:05.0 SCSI storage controller: Red Hat, Inc Virtio block device

[root@dhcp71-167 ~]# ls /sys/bus/pci/devices/
0000:00:01.0  0000:00:02.0  0000:00:03.0  0000:00:04.0  0000:00:05.0


After unplug:
[root@dhcp71-167 ~]# lspci
00:01.0 USB controller: Apple Inc. KeyLargo/Intrepid USB
00:02.0 USB controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 10)
00:04.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:05.0 SCSI storage controller: Red Hat, Inc Virtio block device

[root@dhcp71-167 ~]# ls /sys/bus/pci/devices/
0000:00:01.0  0000:00:02.0  0000:00:04.0  0000:00:05.0

So , I think the guest has obeyed the order "device_del", and unplugged the device. but qemu didn't. 
If the guest is using the device seen from the qemu side, I think a prompt message should be shown like "the device is busy" or others, I don't know, as I am not very clear with the message transmit method.

Comment 7 Qunfang Zhang 2015-08-11 08:35:34 UTC
Hi, Amit

This issue also happens on other pci device, eg: memory balloon.  Boot up a guest with virtio-balloon-pci, hot remove it after guest boots up, the device goes away inside guest however it shows up in the "info pci" output. Which will cause migration fail.  

Do we have a chance to fix it? 

Thanks,
Qunfang

Comment 8 Amit Shah 2015-08-11 08:47:34 UTC
There is a qmp event that is emitted that notifies when the device unplug is successful: DEVICE_DELETED.

Please check if you received the event, and even after that the device is available in qemu.  If that is the case, this is a bug that needs fixing.  Otherwise, comment 5 holds true.

Comment 9 Amit Shah 2015-08-11 08:48:50 UTC
I missed that this is marked for ppc64le.  Is it specific to that architecture?  In that case, it will have to be re-assigned to the ppc team.

Comment 10 Qunfang Zhang 2015-08-11 10:19:38 UTC
(In reply to Amit Shah from comment #8)
> There is a qmp event that is emitted that notifies when the device unplug is
> successful: DEVICE_DELETED.
> 
> Please check if you received the event, and even after that the device is
> available in qemu.  If that is the case, this is a bug that needs fixing. 
> Otherwise, comment 5 holds true.

No qmp even emitted after hot remove the balloon device ("device_del balloon0"). But if as you said guest hasn't given up to use the balloon, how come the device disappear inside guest? 

(1) Before hot unplug:
$ lspci
00:00.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:01.0 USB controller: Apple Inc. KeyLargo/Intrepid USB
00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
00:09.0 Unclassified device [00ff]: Red Hat, Inc Device 1045 (rev 01)

(2) After hot unplug:
$ lspci
00:00.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:01.0 USB controller: Apple Inc. KeyLargo/Intrepid USB
00:02.0 Ethernet controller: Red Hat, Inc Virtio network device


(In reply to Amit Shah from comment #9)
> I missed that this is marked for ppc64le.  Is it specific to that
> architecture?  In that case, it will have to be re-assigned to the ppc team.

This issue only happens on ppc64le. Just tested on x86 host (qemu-kvm-rhev-2.3.0-16.el7.x86_64), can not reproduce the issue.

Comment 11 Amit Shah 2015-08-11 10:51:56 UTC
(In reply to Qunfang Zhang from comment #10)
> (In reply to Amit Shah from comment #8)
> > There is a qmp event that is emitted that notifies when the device unplug is
> > successful: DEVICE_DELETED.
> > 
> > Please check if you received the event, and even after that the device is
> > available in qemu.  If that is the case, this is a bug that needs fixing. 
> > Otherwise, comment 5 holds true.
> 
> No qmp even emitted after hot remove the balloon device ("device_del
> balloon0"). But if as you said guest hasn't given up to use the balloon, how
> come the device disappear inside guest? 

Something is still keeping it busy in the guest, then.  Might be a Linux issue rather than qemu.

> This issue only happens on ppc64le. Just tested on x86 host
> (qemu-kvm-rhev-2.3.0-16.el7.x86_64), can not reproduce the issue.

Thanks, moving to David.

Comment 12 Laurent Vivier 2015-08-12 15:20:50 UTC
What happens:
- QEMU unplug function defers the unplug because the device is not in an isolated state
- QEMU asks RTAS daemon to change state to isolated state
- QEMU change state to isolated state as set by RTAS daemon but can't detach device because the device is not configured.

It seems the device has never been set to the "configured" state.

"configured" is set to true only by "rtas_ibm_configure_connector" which is only called in case of hotplug, this is why we can unplug an hotplugged device.

Should it be called by SLOF ?

Upstream QEMU has the behaviour.

Comment 15 Yash Mankad 2015-08-27 20:09:52 UTC
Fix included in qemu-kvm-rhev-2.3.0-20.el7

Comment 16 Laurent Vivier 2015-08-28 10:34:56 UTC
Could you check if the fix for this BZ fixes also BZ 1250326 ?
https://bugzilla.redhat.com/show_bug.cgi?id=1250326

It really looks like a duplicate.

Comment 17 Xujun Ma 2015-09-01 10:33:24 UTC
Verified the issue on the latest version:

Version-Release number of selected component (if applicable):
Qemu-kvm-rhev: qemu-img-rhev-2.3.0-21.el7.ppc64le

Steps to Reproduce:
1. start a guest with command:
/usr/libexec/qemu-kvm \
 -m 8G -smp 4 -name testvm -monitor stdio -qmp tcp::8889,server,nowait  -vnc :20 -usb -device usb-tablet,id=tablet1 \
 -drive file=/home/xuma/img/img.raw,if=none,id=virtblk_drive,format=raw,cache=none \
 -device virtio-blk-pci,bus=pci.0,addr=0x6,ioeventfd=on,serial=xuma,event_idx=off,drive=virtblk_drive,scsi=off,bootindex=0,physical_block_size=512,logical_block_size=512,id=scsi0-0,disable-legacy=off,disable-modern=on  \
 -device spapr-vscsi,id=scsi0,reg=0x1000 \
 -drive file=/home/xuma/iso/RHEL-LE-7.1-20150219.1-Server-ppc64le-dvd1.iso,if=none,id=cdrom,format=raw \
 -device scsi-cd,bus=scsi0.0,drive=cdrom,id=decdrom,bootindex=1 \
 -netdev tap,id=tap0,script=/etc/qemu-ifup \
 -device virtio-net-pci,netdev=tap0,id=net1,mac=00:54:5a:5f:5b:5c,disable-legacy=off,disable-modern=on,ctrl_mac_addr=off \
 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 \
 -drive file=/home/xuma/img/data.raw,if=none,id=drive1,format=raw,werror=stop,rerror=stop,readonly=on \
 -device virtio-blk,bus=pci.0,drive=drive1,id=virtio1,serial=xuma\

2. hot unplug device in monitor:
device_del virtio1
device_del virtio-serial0
3. check above device in monitor:
info pci


Results:The device virtio1 and virtio-serial0 still can't be hot unpluged.They still appear when run command "info pci".

only the device in command line  can't be hot unpluged.
could you help confirm whether the bug have been fixed ?

Comment 18 Laurent Vivier 2015-09-01 14:12:30 UTC
I've tested and for it works.

Could you check:

- you have installed qemu-img-rhev-2.3.0-21.el7.ppc64le,
  but is qemu-kvm-rhev-2.3.0-21.el7.ppc64le installed?

- is "/usr/sbin/rtas_errd" running in the guest?
  (package ppc64-diag)

- after "device_del", do you wait at least 2 seconds before checking the result to let enough time to the daemon to reply to QEMU?

Comment 19 Xujun Ma 2015-09-02 02:43:54 UTC
hi Vivier:

I try it again,and it can be hot unpluged.maybe the guest doesn't boot up completely.I have tried it for many times after guest boot up completely,and it can be hot unpluged everytime.

Comment 20 Xujun Ma 2015-09-02 10:42:42 UTC
Reproduced the issue on old version:

Version-Release number of selected component (if applicable):
Qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-9.ael7b.ppc64le.rpm


Steps to Reproduce:
1. start a guest with command:
/usr/libexec/qemu-kvm -name xuma-vm -machine pseries,accel=kvm,usb=off -s -m 4G -smp 4,sockets=1,cores=4,threads=1\
 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc \
 -device virtio-scsi-pci,id=scsi0 \
 -drive file=/home/xuma/img/vm.qcow2,if=none,id=drive-0-0-0,format=qcow2,cache=none \
 -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-0-0-0,bootindex=1,id=scsi0-0-0-0  \
 -drive file=/home/xuma/iso/RHEL-LE-7.1-20150219.1-Server-ppc64le-dvd1.iso,if=none,format=raw,id=scsicdrom \
 -device scsi-cd,bus=scsi0.0,drive=scsicdrom,bootindex=2,id=scsi0-0-1-0 \
 -vnc :24 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -vga std -qmp tcp:0:5555,server,nowait \
 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
 -netdev tap,id=tap0,script=/etc/qemu-ifup \
 -device virtio-net-pci,netdev=tap0,id=net1,mac=00:54:5a:5f:5b:5c,disable-legacy=off,disable-modern=on,ctrl_mac_addr=off \
 -device usb-ehci,id=ehci0 \
 -device virtio-balloon-pci,id=balloon \

2. hot unplug device in monitor:
device_del scsi0
device_del net1
device_del ehci0
device_del balloon
device_del virtio-serial0

3. check above device in monitor:
info pci

Results:The device scsi0,net1,ehic0,balloon, and virtio-serial0 still can't be hot unpluged.They still appear when run command "info pci".

Verified the issue on the latest version:

Version-Release number of selected component (if applicable):
Qemu-kvm-rhev: qemu-img-rhev-2.3.0-21.el7.ppc64le

Steps to Reproduce:
1. start a guest with command:
/usr/libexec/qemu-kvm -name xuma-vm -machine pseries,accel=kvm,usb=off -s -m 4G -smp 4,sockets=1,cores=4,threads=1\
 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc \
 -device virtio-scsi-pci,id=scsi0 \
 -drive file=/home/xuma/img/vm.qcow2,if=none,id=drive-0-0-0,format=qcow2,cache=none \
 -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-0-0-0,bootindex=1,id=scsi0-0-0-0  \
 -drive file=/home/xuma/iso/RHEL-LE-7.1-20150219.1-Server-ppc64le-dvd1.iso,if=none,format=raw,id=scsicdrom \
 -device scsi-cd,bus=scsi0.0,drive=scsicdrom,bootindex=2,id=scsi0-0-1-0 \
 -vnc :24 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -vga std -qmp tcp:0:5555,server,nowait \
 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
 -netdev tap,id=tap0,script=/etc/qemu-ifup \
 -device virtio-net-pci,netdev=tap0,id=net1,mac=00:54:5a:5f:5b:5c,disable-legacy=off,disable-modern=on,ctrl_mac_addr=off \
 -device usb-ehci,id=ehci0 \
 -device virtio-balloon-pci,id=balloon \

2. hot unplug device in monitor:
device_del scsi0
device_del net1
device_del ehci0
device_del balloon
device_del virtio-serial0

3. check above device in monitor:
info pci

Results:The device scsi0,net1,ehic0,balloon, and virtio-serial0 can be hot unpluged.
Those devices can't be found in guest.

Comment 22 errata-xmlrpc 2015-12-04 16:51:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html