Bug 1540003
Summary: | Postcopy migration failed with "Unreasonably large packaged state" | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | xianwang <xianwang> |
Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> |
Status: | CLOSED ERRATA | QA Contact: | xianwang <xianwang> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 7.5 | CC: | bugproxy, danielhb, dgibson, hannsj_uhl, jsuchane, knoel, lmiksik, lvivier, michen, mtessun, qzhang, virt-maint, xianwang |
Target Milestone: | rc | Keywords: | Patch |
Target Release: | 7.5 | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.10.0-21.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-11 00:58:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1399177, 1476742, 1539427 |
Description
xianwang
2018-01-30 03:35:25 UTC
The real problem is "Unreasonably large packaged state", the second error message is another bug triggered by the first problem. This problem is addressed by: "migration/savevm.c: set MAX_VM_CMD_PACKAGED_SIZE to 1ul << 32" http://patchwork.ozlabs.org/patch/8670 This problem should not be related to P9. Could you test the migration between two P8 hosts (with the same command line)? I've faced this problem in a different scenario, P9->P9 migration, using a pseries guest with few devices but lots of RAM (128Gb+). When starting the postcopy migration, QEMU sends a single blob with the entire device state of the guest to the destination. This blob has a maximum size of 16 megabytes. If the blob turns out to be larger than that (which is the case here: 20771356 is 20MB) QEMU aborts the sending and migration fails on both sides. This happens in the migration code with all architectures, it is not a P9 exclusive behaviour. Thus, as Laurent mentioned in comment #3, you should be able to reproduce it in a P8->P8 scenario too. (In reply to Laurent Vivier from comment #3) > This problem should not be related to P9. Could you test the migration > between two P8 hosts (with the same command line)? Yes, I have tried the postcopy migration(same as bug report) between two p8 hosts with the same command line, the result is same with this bug, the version is as following: 3.10.0-837.el7.ppc64le qemu-kvm-rhev-2.10.0-18.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch Blocker justification: 1) Migration, including post-copy, is a primary feature we intend to support for POWER KVM guests. 2) This bug could trip on essentially any post-copy migration, there's no simple way for a user to work around the problem. 3) The fix is extremely simple - simply changing a single limit value within qemu. Fix included in qemu-kvm-rhev-2.10.0-21.el7 This bug could be reproduced with qemu-kvm-rhev-2.10.0-20.el7. I tried both P8 <-> P8 and P8 -> P9 with the same steps and comment line as comment 0. After start postcopy migration, I hit the errors: src host: (qemu) qemu-kvm: block/io.c:1557: bdrv_co_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed. Aborted (core dumped) dst host: (qemu) qemu-kvm: load of migration failed: Input/output error Verified this bug with qemu-kvm-rhev-2.10.0-21.el7, the issue doesn't exist any more. Migration finishes successfully with the original command line and steps. I'll set the status to VERIFIED, if some more testing required, please let me know. Besides, since there are multiple migration bugs got fixed recently, we'll conduct migration function test on both Power8 and Power9 for regression purpose. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104 |