1540003 – Postcopy migration failed with "Unreasonably large packaged state"

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1540003 - Postcopy migration failed with "Unreasonably large packaged state"

Summary: Postcopy migration failed with "Unreasonably large packaged state"

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.5
Hardware:	ppc64le
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	7.5
Assignee:	Laurent Vivier
QA Contact:	xianwang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1399177 1476742 1539427
TreeView+	depends on / blocked

Reported:	2018-01-30 03:35 UTC by xianwang
Modified:	2018-05-16 09:13 UTC (History)
CC List:	13 users (show)
Fixed In Version:	qemu-kvm-rhev-2.10.0-21.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-11 00:58:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	164469	0	None	None	None	2019-05-31 05:11:59 UTC
Red Hat Product Errata	RHSA-2018:1104	0	normal	SHIPPED_LIVE	Important: qemu-kvm-rhev security, bug fix, and enhancement update	2018-04-10 22:54:38 UTC

Description xianwang 2018-01-30 03:35:25 UTC

Description of problem:
Postcopy migration from p8 to p9, after switch to postcopy mode, qemu of source side quit automatically with "qemu-kvm: block/io.c:1557: bdrv_co_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed."

Version-Release number of selected component (if applicable):
HostA p8 (RHEL7.5):
3.10.0-837.el7.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
SLOF-20170724-5.git89f519f.el8.ppc64le

HostB p9 (RHEL7.5-alt):
4.14.0-33.el7a.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

Guest:
Compose: RHEL-7.5-20171107.1
3.10.0-837.el7.ppc64le
virtio_scsi
virtio_net
vcpu:8
mem:8G

How reproducible:
5/5

Steps to Reproduce:
1.Boot a guest on power8 host, the detail qemu cli is as additional info
2.Launch listening mode on power9 host, the difference of qemu cli between p8 and p9 is: -machine pseries-rhel7.5.0,max-cpu-compat=power8; -incoming tcp:0:5801
3.Do migration from p8 to p9 and switch to postcopy mode
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.19.19.17:5801
(qemu) migrate_start_postcopy


Actual results:
migration failed, qemu on both sides quit automatically with error message

on src:
(qemu) qemu-kvm: qemu_savevm_send_packaged: Unreasonably large packaged state: 20771356
qemu-kvm: block/io.c:1557: bdrv_co_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
stable2_p8_p9.sh: line 61: 125579 Aborted                 /usr/libexec/qemu-kvm -name 'avocado-vt-vm1'...........

on dst:
(qemu) qemu-kvm: load of migration failed: Input/output error

Expected results:
migration complete and vm works well

Additional info:
qemu cli of p9:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-machine pseries-rhel7.5.0,max-cpu-compat=power8 \
-nodefaults \
-vga std \
-device spapr-pci-host-bridge,index=1   \
-device virtio-scsi-pci,bus=pci.1,id=scsi0,addr=0x3 \
-drive  file=/home/xianwang/mount_point/d1.raw,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none  \
-device  scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,channel=0,scsi-id=0,lun=0   \
-device  virtio-serial-pci,disable-legacy=on,disable-modern=off,id=agent-virtio-serial0,max_ports=16,vectors=0,bus=pci.1,addr=0x4,ioeventfd=on  \
-chardev socket,id=charchannel1,path=/home/channel1,server,nowait  \
-device virtserialport,bus=agent-virtio-serial0.0,nr=3,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 \
-device spapr-pci-host-bridge,index=2  \
-drive  file=/home/xianwang/mount_point/d2.qcow2,id=drive_blk,format=qcow2,if=none,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive_blk,id=device_blk,multifunction=on,bus=pci.2,addr=0x03.0 \
-device virtio-scsi-pci,id=scsi1,multifunction=on,bus=pci.2,addr=0x03.1 \
-drive  file=/home/xianwang/mount_point/d3.raw,if=none,id=drive-data0,format=raw,cache=none,aio=native  \
-device  scsi-hd,drive=drive-data0,id=data0,bus=scsi1.0,channel=0,scsi-id=0,lun=1 \
-device virtio-serial-pci,id=virtio-serial0,max_ports=32 \
-chardev socket,id=channel1,path=/tmp/helloworld0,server,nowait \
-device  virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port0 \
-chardev socket,id=serial_id_serial0,path=/tmp/console0,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
-object rng-random,filename=/dev/random,id=passthrough-rOXjKxaC \
-device virtio-rng-pci,id=virtio-rng-pci-GVn8yzUA,rng=passthrough-rOXjKxaC,bus=pci.0,addr=0x04 \
-device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x05 \
-device pci-ohci,id=usb3,bus=pci.0,addr=0x06 \
-device pci-bridge,id=pci_bridge_1,bus=pci.0,addr=0xc,chassis_nr=1 \
-device pci-bridge,id=pci_bridge_2,bus=pci.0,addr=0xd,chassis_nr=2 \
-object iothread,id=iothread0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci_bridge_1,iothread=iothread0,addr=0x07 \
-device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pci.0,addr=0x08 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/xianwang/mount_point/rhel75-ppc64le-virtio-scsi.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-drive file=/home/xianwang/mount_point/d4.qcow2,format=qcow2,if=none,cache=none,id=drive_plane,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive_plane,id=plane,bus=pci_bridge_2,addr=0x09,iothread=iothread0 \
-drive file=/home/xianwang/mount_point/d5.qcow2,if=none,id=drive-system-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop \
-device spapr-vscsi,reg=0x1000,id=scsi3 \
-device scsi-hd,drive=drive-system-disk,id=system-disk,bus=scsi3.0,channel=0,scsi-id=0,lun=0 \
-drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/xianwang/mount_point/RHEL-7.5-20171215.0-Server-ppc64le-dvd1.iso \
-device scsi-cd,id=cd1,drive=drive_cd1,bus=virtio_scsi_pci1.0,channel=0,scsi-id=0,lun=0,bootindex=1 \
-device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,netdev=idjlQN53,vectors=10,mq=on,status=on,bus=pci.0,addr=0xa \
-netdev tap,id=idjlQN53,vhost=on,queues=4,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device spapr-vlan,mac=9a:4f:50:51:52:54,netdev=hostnet0,id=net0 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 4G,slots=4,maxmem=32G \
-smp 8,cores=4,threads=1,sockets=2 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-mouse,id=input1,bus=usb1.0,port=2 \
-device usb-kbd,id=input2,bus=usb1.0,port=3 \
-vnc :2 \
-incoming tcp:0:5801 \
-qmp tcp:0:8881,server,nowait \
-monitor stdio \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=on,strict=on \
-enable-kvm \
-watchdog i6300esb \
-watchdog-action reset \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xb \

Comment 2 Laurent Vivier 2018-01-30 10:40:19 UTC

The real problem is "Unreasonably large packaged state", the second error message is another bug triggered by the first problem.

This problem is addressed by:

"migration/savevm.c: set MAX_VM_CMD_PACKAGED_SIZE to 1ul << 32"
http://patchwork.ozlabs.org/patch/8670

Comment 3 Laurent Vivier 2018-01-30 10:41:34 UTC

This problem should not be related to P9. Could you test the migration between two P8 hosts (with the same command line)?

Comment 4 Daniel Henrique Barboza 2018-01-30 14:12:19 UTC

I've faced this problem in a different scenario, P9->P9 migration, using a pseries guest with few devices but lots of RAM (128Gb+). When starting the postcopy migration, QEMU sends a single blob with the entire device state of the guest to the destination. This blob has a maximum size of 16 megabytes. If the blob turns out to be larger than that (which is the case here: 20771356 is 20MB) QEMU aborts the sending and migration fails on both sides.

This happens in the migration code with all architectures, it is not a P9 exclusive behaviour. Thus, as Laurent mentioned in comment #3, you should be able to reproduce it in a P8->P8 scenario too.

Comment 6 xianwang 2018-01-31 07:55:46 UTC

(In reply to Laurent Vivier from comment #3)
> This problem should not be related to P9. Could you test the migration
> between two P8 hosts (with the same command line)?

Yes, I have tried the postcopy migration(same as bug report) between two p8 hosts with the same command line, the result is same with this bug, the version is as following:

3.10.0-837.el7.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

Comment 9 David Gibson 2018-02-12 01:13:07 UTC

Blocker justification:

1) Migration, including post-copy, is a primary feature we intend to support for POWER KVM guests.

2) This bug could trip on essentially any post-copy migration, there's no simple way for a user to work around the problem.

3) The fix is extremely simple - simply changing a single limit value within qemu.

Comment 12 Miroslav Rezanina 2018-02-20 13:43:32 UTC

Fix included in qemu-kvm-rhev-2.10.0-21.el7

Comment 14 Qunfang Zhang 2018-02-22 07:05:32 UTC

This bug could be reproduced with qemu-kvm-rhev-2.10.0-20.el7. I tried both P8 <-> P8 and P8 -> P9 with the same steps and comment line as comment 0.  

After start postcopy migration, I hit the errors:

src host:

(qemu) qemu-kvm: block/io.c:1557: bdrv_co_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
Aborted (core dumped)

dst host:
(qemu) qemu-kvm: load of migration failed: Input/output error

Verified this bug with qemu-kvm-rhev-2.10.0-21.el7, the issue doesn't exist any more. Migration finishes successfully with the original command line and steps. 

I'll set the status to VERIFIED, if some more testing required, please let me know.

Besides, since there are multiple migration bugs got fixed recently, we'll conduct migration function test on both Power8 and Power9 for regression purpose.

Comment 16 errata-xmlrpc 2018-04-11 00:58:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104

Note You need to log in before you can comment on or make changes to this bug.