Bug 1831103 - QEMU crash on the second migration of a paused VM
Summary: QEMU crash on the second migration of a paused VM
Keywords:
Status: CLOSED DUPLICATE of bug 1713009
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.2
Assignee: Virtualization Maintenance
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On:
Blocks: 1819721
TreeView+ depends on / blocked
 
Reported: 2020-05-04 16:13 UTC by Milan Zamazal
Modified: 2020-05-12 01:34 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-07 15:17:52 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Example domain XML (1.64 KB, text/plain)
2020-05-04 16:13 UTC, Milan Zamazal
no flags Details
qemu-core-dump-log (16.58 KB, text/plain)
2020-05-07 15:04 UTC, Li Xiaohui
no flags Details

Description Milan Zamazal 2020-05-04 16:13:19 UTC
Created attachment 1684886 [details]
Example domain XML

Description of problem:

When a VM is started as paused from libvirt, migrated to another host and then migrated back to the original host, QEMU crashes with the following error:

 qemu-kvm: Failed to load virtio_pci/modern_queue_state:avail
 qemu-kvm: Failed to load virtio_pci/modern_state:vqs
 qemu-kvm: Failed to load virtio/extra_state:extra_state
 qemu-kvm: Failed to load virtio-rng:virtio
 qemu-kvm: error while loading state for instance 0x0 of device '0000:00:01.2:00.0/virtio-rng'
 qemu-kvm: load of migration failed: Input/output error
 shutting down, reason=crashed

Version-Release number of selected component (if applicable):

qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950.x86_64
kernel-4.18.0-193.el8.x86_64
libvirt-daemon-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64

I experience the bug also on el7 with qemu-kvm-ev-2.12.0-44.1.el7_8.1.x86_64.

How reproducible:

100% (on RHV hosts)

Steps to Reproduce:

1. Take the attached domain XML and start the corresponding VM as paused using virsh:

# virsh create domain.xml --paused

2. Migrate the VM to another host:

# virsh migrate test qemu+tls://ANOTHER-HOST/system --live

3. Migrate the VM back to the original host from ANOTHER-HOST:

# virsh migrate test qemu+tls://ORIGINAL-HOST/system --live

Actual results:

The second migration fails with

error: operation failed: domain is not running

Expected results:

The migration succeeds.

Comment 1 Li Xiaohui 2020-05-07 02:37:08 UTC
I could reproduce this bz via libvirt, still try to reproduce it from qemu side

Comment 2 Li Xiaohui 2020-05-07 09:00:48 UTC
Find two points to reproduce bz:
1.Only when firmware is ovmf that will reproduce this bz, seabios is ok;
2.And ping-pong migration succeed when ovmf + running vm, but ovmf + paused vm will reproduce bz.


Also reproduce on qemu side, steps are followings:
1.boot a vm with clis[1] on src host;
2.boot a vm with same clis but append "-incoming defer";
3.execute qmp commands on src&dst host:
(1)src qmp:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}
(2)dst qmp:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"late-block-activate","state":true}]}}
{"execute":"migrate-incoming","arguments":{"uri":"tcp:[::]:49152"}}
(3)src qmp:
{"execute": "migrate","arguments":{"uri": "tcp:10.73.130.69:49152"}}
{"execute":"query-migrate"}
{"execute":"migrate-continue","arguments":{"state":"pre-switchover"}}
4.After migrate vm from src to dst host, quit qemu and start a vm with same clis[1] but append "-incoming defer" on src host.
5.execute qmp commands on src&dst host to migrate vm back to src host:
(1)dst qemu:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"late-block-activate","state":false}]}}
(2)src qemu:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"late-block-activate","state":true}]}}
{"execute":"migrate-incoming","arguments":{"uri":"tcp:[::]:49152"}}
(3)dst qemu
{"execute": "migrate","arguments":{"uri": "tcp:10.73.130.67:49152"}}
{"execute":"query-migrate"}
{"execute":"migrate-continue","arguments":{"state":"pre-switchover"}}


Actual Result:
After step 5-(3), qemu on src host will quit with following errors, and qemu on dst host will crash (will provide the core dump file later):
[root@hp-dl385g10-13 home]# sh libvirt2.sh 
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) 2020-05-07T08:53:21.238754Z qemu-kvm: Failed to load virtio_pci/modern_queue_state:avail
2020-05-07T08:53:21.238843Z qemu-kvm: Failed to load virtio_pci/modern_state:vqs
2020-05-07T08:53:21.238862Z qemu-kvm: Failed to load virtio/extra_state:extra_state
2020-05-07T08:53:21.238894Z qemu-kvm: Failed to load virtio-rng:virtio
2020-05-07T08:53:21.238920Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:01.2:00.0/virtio-rng'
2020-05-07T08:53:21.239074Z qemu-kvm: load of migration failed: Input/output error


clis[1]:
/usr/libexec/qemu-kvm \
-name guest=test,debug-threads=on \
-S \
-blockdev node-name=libvirt-pflash0-storage,driver=file,auto-read-only=on,discard=unmap,filename=/usr/share/OVMF/OVMF_CODE.secboot.fd \
-blockdev node-name=libvirt-pflash0-format,read-only=on,driver=raw,file=libvirt-pflash0-storage \
-blockdev node-name=libvirt-pflash1-storage,driver=file,auto-read-only=on,discard=unmap,filename=/tmp/OVMF_VARS.fd \
-blockdev node-name=libvirt-pflash1-format,read-only=off,driver=raw,file=libvirt-pflash1-storage \
-machine pc-q35-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-cpu qemu64 \
-m 1024 \
-overcommit mem-lock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid be5615db-b7fe-44f7-aacb-ce7ac05367ed \
-smbios type=1,manufacturer=oVirt,product=RHEL,version=8.2-1.0.el8,serial=5b34bd5b-2f90-4609-87c0-6f74ef4de39f,uuid=be5615db-b7fe-44f7-aacb-ce7ac05367ed,family=oVirt \
-display none \
-no-user-config -nodefaults \
-device sga \
-chardev socket,id=charmonitor,path=/home/hello1,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot menu=on,splash-time=30000,strict=on \
-device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
-device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \
-device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \
-device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \
-device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \
-device qemu-xhci,id=usb,bus=pci.1,addr=0x0 \
-device virtio-serial-pci,id=ua-9c2cb3cc-b24f-4155-9eca-fc523fdee5d1,max_ports=16,bus=pci.5,addr=0x0 \
-chardev socket,id=charua-41db8caa-b9e3-461c-b7d2-dc343a26a5b2,path=/home/hello2,server,nowait \
-device isa-serial,chardev=charua-41db8caa-b9e3-461c-b7d2-dc343a26a5b2,id=ua-41db8caa-b9e3-461c-b7d2-dc343a26a5b2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x0 \
-object rng-random,id=objua-40c09ebf-e493-4209-a7ff-860dd23b6081,filename=/dev/urandom \
-device virtio-rng-pci,rng=objua-40c09ebf-e493-4209-a7ff-860dd23b6081,id=ua-40c09ebf-e493-4209-a7ff-860dd23b6081,bus=pci.3,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on \
-monitor stdio \

Comment 3 Dr. David Alan Gilbert 2020-05-07 09:53:52 UTC
You say:

> Actual Result:
> After step 5-(3), qemu on src host will quit with following errors, and qemu on dst host will crash (will provide the core dump file later):
> [root@hp-dl385g10-13 home]# sh libvirt2.sh 
> QEMU 4.2.0 monitor - type 'help' for more information
> (qemu) 2020-05-07T08:53:21.238754Z qemu-kvm: Failed to load virtio_pci/modern_queue_state:avail
> 2020-05-07T08:53:21.238843Z qemu-kvm: Failed to load virtio_pci/modern_state:vqs
> 2020-05-07T08:53:21.238862Z qemu-kvm: Failed to load virtio/extra_state:extra_state
> 2020-05-07T08:53:21.238894Z qemu-kvm: Failed to load virtio-rng:virtio
> 2020-05-07T08:53:21.238920Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:01.2:00.0/virtio-rng'
> 2020-05-07T08:53:21.239074Z qemu-kvm: load of migration failed: Input/output error

That's the errors from the destination - what about the source side - what errors did it print?

(My guess is that this is a block assert, we've got a few relating to already paused migrations)

Comment 4 Li Xiaohui 2020-05-07 15:00:46 UTC
(In reply to Dr. David Alan Gilbert from comment #3)
> You say:
> 
> > Actual Result:
> > After step 5-(3), qemu on src host will quit with following errors, and qemu on dst host will crash (will provide the core dump file later):
> > [root@hp-dl385g10-13 home]# sh libvirt2.sh 
> > QEMU 4.2.0 monitor - type 'help' for more information
> > (qemu) 2020-05-07T08:53:21.238754Z qemu-kvm: Failed to load virtio_pci/modern_queue_state:avail
> > 2020-05-07T08:53:21.238843Z qemu-kvm: Failed to load virtio_pci/modern_state:vqs
> > 2020-05-07T08:53:21.238862Z qemu-kvm: Failed to load virtio/extra_state:extra_state
> > 2020-05-07T08:53:21.238894Z qemu-kvm: Failed to load virtio-rng:virtio
> > 2020-05-07T08:53:21.238920Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:01.2:00.0/virtio-rng'
> > 2020-05-07T08:53:21.239074Z qemu-kvm: load of migration failed: Input/output error
> 
> That's the errors from the destination - what about the source side - what
> errors did it print?
The errors on source side:
(qemu) qemu-kvm: block.c:5659: bdrv_inactivate_recurse: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
libvirt.sh: line 39: 367426 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name guest=test,debug-threads=on -S -blockdev node-name=libvirt-pflash0-storage,driver=file,auto-read-only=on,discard=unmap,filename=/usr/share/OVMF/OVMF_CODE.secboot.fd -blockdev node-name=libvirt-pflash0-format,read-only=on,driver=raw,file=libvirt-pflash0-storage -blockdev node-name=libvirt-pflash1-storage,driver=file,auto-read-only=on,discard=unmap,filename=/tmp/OVMF_VARS.fd -blockdev node-name=libvirt-pflash1-format,read-only=off,driver=raw,file=libvirt-pflash1-storage -machine pc-q35-rhel8.1.0,accel=kvm,usb=off,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format -cpu qemu64 -m 1024 -overcommit mem-lock=off -smp 1,sockets=1,cores=1,threads=1 -uuid be5615db-b7fe-44f7-aacb-ce7ac05367ed -smbios type=1,manufacturer=oVirt,product=RHEL,version=8.2-1.0.el8,serial=5b34bd5b-2f90-4609-87c0-6f74ef4de39f,uuid=be5615db-b7fe-44f7-aacb-ce7ac05367ed,family=oVirt -display none -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/home/hello1,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot menu=on,splash-time=30000,strict=on -device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 -device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 -device qemu-xhci,id=usb,bus=pci.1,addr=0x0 -device virtio-serial-pci,id=ua-9c2cb3cc-b24f-4155-9eca-fc523fdee5d1,max_ports=16,bus=pci.5,addr=0x0 -chardev socket,id=charua-41db8caa-b9e3-461c-b7d2-dc343a26a5b2,path=/home/hello2,server,nowait -device isa-serial,chardev=charua-41db8caa-b9e3-461c-b7d2-dc343a26a5b2,id=ua-41db8caa-b9e3-461c-b7d2-dc343a26a5b2 -device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x0 -object rng-random,id=objua-40c09ebf-e493-4209-a7ff-860dd23b6081,filename=/dev/urandom -device virtio-rng-pci,rng=objua-40c09ebf-e493-4209-a7ff-860dd23b6081,id=ua-40c09ebf-e493-4209-a7ff-860dd23b6081,bus=pci.3,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on -monitor stdio -incoming defer

And please visit attachment to see the qemu core dump log on source side
> 
> (My guess is that this is a block assert, we've got a few relating to
> already paused migrations)

Comment 5 Li Xiaohui 2020-05-07 15:04:05 UTC
Created attachment 1686213 [details]
qemu-core-dump-log

Comment 6 Dr. David Alan Gilbert 2020-05-07 15:17:52 UTC
Yeh that's the assert I thought it would be; this is a dupe

*** This bug has been marked as a duplicate of bug 1713009 ***


Note You need to log in before you can comment on or make changes to this bug.