Bug 1620373

Summary: Failed to do migration after hotplug and hotunplug the ivshmem device
Product: Red Hat Enterprise Linux 7 Reporter: yafu <yafu>
Component: qemu-kvm-rhevAssignee: Markus Armbruster <armbru>
Status: CLOSED ERRATA QA Contact: Yumei Huang <yuhuang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: chayang, dgilbert, dyuan, fjin, hhuang, jinzhao, jiyan, juzhang, lmen, mrezanin, pezhang, quintela, qzhang, virt-maint, xianwang, xuzhang, yuhuang
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: qemu-kvm-rhev-2.12.0-19.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1688301 (view as bug list) Environment:
Last Closed: 2019-08-22 09:18:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1651787, 1688301    

Description yafu 2018-08-23 05:51:47 UTC
Description of problem:
Failed to do migration after hotplug and hotunplug the shmem device

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.Hotplug shmem device to a running guest:
#virsh attach-device iommu1 shmem.xml
Device attached successfully

2.Check the shmem device in the live xml:
#virsh dumpxml iommu1
<shmem name='my_shmem0'>
      <model type='ivshmem-plain'/>
      <size unit='M'>4</size>
      <alias name='shmem0'/>
      <address type='pci' domain='0x0000' bus='0x0c' slot='0x03' function='0x0'/>

3.Check shmem device in the guest os:
# lspci | grep -i shared
0c:03.0 RAM memory: Red Hat, Inc. Inter-VM shared memory (rev 01)

4.Do migration(failed is the expected results):
# virsh migrate iommu1 qemu+ssh:// --live --verbose --unsafe --p2p --tunnelled --persistent
error: Requested operation is not valid: migration with shmem device is not supported

5.Hotunplug the shmem device from the guest:
#virsh detach-device iommu1 shmem.xml
Device detached successfully

6.Check shmem device in the guest os:
# lspci | grep -i shared
no output

7.Do migration again:
# virsh migrate iommu1 qemu+ssh:// --live --verbose --unsafe
error: internal error: unable to execute QEMU command 'migrate': Migration is disabled when using feature 'peer mode' in device 'ivshmem'

Actual results:
Failed to do migration after hotplug and hotunplug the shmem device

Expected results:
Should do migration successfully after hotunplug the shmem device.

Additional info:

Comment 4 Martin Kletzander 2018-08-29 16:39:05 UTC
Looking at the logs it doesn't look like there's anything left.

Inspecting the QOM-tree I found nothing (although I didn't know where precisely to look for, qom-tree didn't work for me for some reason).

Is there anything more libvirt could do, actually?  Look at what we do on the monitor when I attach and then detach the device:

# We plug the backend:
reply={"return": {}, "id": "libvirt-13"}

# Then the device itself:
reply={"return": {}, "id": "libvirt-14"}

# And then check if it is plugged (it is, shmem0 is there):
reply={"return": [{"name": "rng0", "type": "child<virtio-rng-pci>"}, {"name": "sound0-codec0", "type": "child<hda-duplex>"}, {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name": "video0", "type": "child<qxl-vga>"}, {"name": "serial0", "type": "child<isa-serial>"}, {"name": "sound0", "type": "child<intel-hda>"}, {"name": "balloon0", "type": "child<virtio-balloon-pci>"}, {"name": "channel1", "type": "child<virtserialport>"}, {"name": "channel0", "type": "child<virtserialport>"}, {"name": "net0", "type": "child<virtio-net-pci>"}, {"name": "input0", "type": "child<usb-tablet>"}, {"name": "shmem0", "type": "child<ivshmem-plain>"}, {"name": "redir1", "type": "child<usb-redir>"}, {"name": "redir0", "type": "child<usb-redir>"}, {"name": "usb", "type": "child<ich9-usb-ehci1>"}, {"name": "type", "type": "string"}, {"name": "ide0-0-0", "type": "child<ide-cd>"}], "id": "libvirt-15"}

# When unplugging we remove the device:
reply={"return": {}, "id": "libvirt-16"}

# Then wait for the event:
event={"timestamp": {"seconds": 1535560504, "microseconds": 629824}, "event": "DEVICE_DELETED", "data": {"device": "shmem0", "path": "/machine/peripheral/shmem0"}}

# Remove the backend
reply={"return": {}, "id": "libvirt-17"}

# And check that it was removed:
reply={"return": [{"name": "rng0", "type": "child<virtio-rng-pci>"}, {"name": "sound0-codec0", "type": "child<hda-duplex>"}, {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name": "video0", "type": "child<qxl-vga>"}, {"name": "serial0", "type": "child<isa-serial>"}, {"name": "sound0", "type": "child<intel-hda>"}, {"name": "balloon0", "type": "child<virtio-balloon-pci>"}, {"name": "channel1", "type": "child<virtserialport>"}, {"name": "channel0", "type": "child<virtserialport>"}, {"name": "net0", "type": "child<virtio-net-pci>"}, {"name": "input0", "type": "child<usb-tablet>"}, {"name": "redir1", "type": "child<usb-redir>"}, {"name": "redir0", "type": "child<usb-redir>"}, {"name": "usb", "type": "child<ich9-usb-ehci1>"}, {"name": "type", "type": "string"}, {"name": "ide0-0-0", "type": "child<ide-cd>"}], "id": "libvirt-18"}

Comment 5 Hai Huang 2018-08-30 14:48:17 UTC
Dave and Juan,

As Martin commented above, libvirt seems to be cleaning up the 
ivshmem device properly after the device unplug.

However as described by the intial problem report, migration 
is still failing:

  7.Do migration again:
  # virsh migrate iommu1 qemu+ssh:// --live --verbose --unsafe
  error: internal error: unable to execute QEMU command 'migrate': Migration 
  is disabled when using feature 'peer mode' in device 'ivshmem'

Would it be possible for you to provide some information on the 
"dirty state" that is causing migration code to generate this error 
message and thus failing the migration operation.


Comment 6 Dr. David Alan Gilbert 2018-08-30 15:00:30 UTC
it should work (TM).
I can see hw/misc/ivshmem.c registers a 'migration blocker' in it's ivshmem_common_realize and in it's ivshmem_exit it does:

    if (s->migration_blocker) {

so all peachy in theory.

so I guess the question is whether ivshmem_exit is being called

Comment 7 Markus Armbruster 2018-09-25 16:16:16 UTC
> so I guess the question is whether ivshmem_exit is being called

Indeed.  It isn't.  Upstream commit 2aece63c8a9 (v2.7.0) screwed it up.

Comment 8 Markus Armbruster 2018-09-26 17:02:37 UTC
We got that bug in the rebase to v2.9.0 for RHEL 7.4's qemu-kvm-rhev.
I'm pretty sure RHEL-7.3's qemu-kvm-rhev worked, i.e. this is a regression.
Not a recent one, though.

Comment 10 Markus Armbruster 2018-10-11 14:15:18 UTC
Fixed upstream in commit b266f1d1123396f9f5df865508f7555ab0c9582a.

Comment 12 Miroslav Rezanina 2018-11-21 15:13:53 UTC
Fix included in qemu-kvm-rhev-2.12.0-19.el7

Comment 14 Yumei Huang 2018-12-18 07:18:59 UTC

1. Boot guest on src host, do hotplug ivshmem and unplug it

# /usr/libexec/qemu-kvm -m 1024,slots=4,maxmem=32G -smp 16,cores=8,threads=1,sockets=2 -numa node -vnc :0 -monitor stdio /home/kvm_autotest_root/images/rhel76-64-virtio-scsi.qcow2 

(qemu) object_add memory-backend-file,id=mem1,size=512M,mem-path=/mnt/kvm_hugepage,share
(qemu) device_add ivshmem-plain,id=ivshmem1,memdev=mem1

(qemu) device_del ivshmem1   
(qemu) object_del mem1 

2. Boot guest on dst host with "-incoming tcp:0:1234"

3. Migrate guest from src host to dst host

(qemu) migrate -d tcp:xxxx:1234
Migration is disabled when using feature 'peer mode' in device 'ivshmem'

Migration failed with above error message. 


Same steps as above, migration works well. 

(qemu) migrate -d tcp:xxx:1234
(qemu) info migrate
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off late-block-activate: off 
Migration status: active
total time: 2994 milliseconds
expected downtime: 300 milliseconds
setup: 22 milliseconds
transferred ram: 98361 kbytes
throughput: 268.58 mbps
remaining ram: 945040 kbytes
total ram: 1065800 kbytes
duplicate: 5660 pages
skipped: 0 pages
normal: 24530 pages
normal bytes: 98120 kbytes
dirty sync count: 1
page size: 4 kbytes

Comment 18 errata-xmlrpc 2019-08-22 09:18:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.