Bug 1743639
Summary: | Backward migration from qemu4.1 to qemu-kvm-rhev-2.12.0 failed with message "qemu-kvm: error while loading state for instance 0x0 of device 'spapr'" | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | xianwang <xianwang> |
Component: | qemu-kvm | Assignee: | Laurent Vivier <lvivier> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Gu Nini <ngu> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 8.1 | CC: | dgibson, hhuang, juzhang, knoel, lvivier, mdeng, micai, ngu, qzhang, virt-maint, xuma, yihyu, zhenyzha |
Target Milestone: | rc | Keywords: | Regression, TestBlocker |
Target Release: | --- | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-28 09:48:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1744170 | ||
Bug Blocks: |
Description
xianwang
2019-08-20 11:04:55 UTC
This is ppc only bug I. This is a regression bug on qemu4.1, if destination build is qemu4.0, it works well, i.e, it works well on below build: P9<->P9: Host A p9(alt7.6): 4.14.0-115.8.2.el7a.ppc64le qemu-kvm-rhev-2.12.0-18.el7_6.7.ppc64le Host B p9(rhel8.1.0 fast train): 4.18.0-134.el8.ppc64le qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.ppc64le In fact, this issue is fixed on qemu4.0 as following bz: https://bugzilla.redhat.com/show_bug.cgi?id=1709726 This issue is hit when do migration from qemu4.1-->qemu4.0 and qemu4.1-->qemu3.1 as following: I: Host A p8 (qemu4.1): 4.18.0-130.el8.ppc64le qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le Host B p9 (qemu4.0): 4.18.0-134.el8.ppc64le qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.ppc64le qemu cli: /usr/libexec/qemu-kvm -nodefaults -monitor stdio -machine pseries-rhel8.1.0,max-cpu-compat=power8 result is same with bug report II: Host A p8 (qemu4.1): 4.18.0-130.el8.ppc64le qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le Host B p9 (qemu3.1): 4.18.0-134.el8.ppc64le qemu-kvm-3.1.0-30.module+el8.0.1+3755+6782b0ed.ppc64le qemu cli: /usr/libexec/qemu-kvm -nodefaults -monitor stdio -machine pseries-rhel7.6.0,max-cpu-compat=power8 Laurent, Assigning to you, it may or may not be related to the migration bugs you're already looking at. This issue is also hit when do migration from power8 with rhel8.1.0(qemu4.1) to power8 with rhel7.6.z(qemu-kvm-rhev-2.12). source p8: 4.18.0-130.el8.ppc64le qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le SLOF-20190114-2.gita5b428e.module+el8.1.0+3554+1a3a94a6.noarch destination p8: 3.10.0-957.35.1.el7.ppc64le qemu-kvm-rhev-2.12.0-18.el7_6.7.ppc64le SLOF-20171214-2.gitfa98132.el7.noarch Thank you xianwang. In the new machine type (BZ 1744170) I think the problem is fixed by: commit 3725ef1a944bbe1173b55fdabe76fb17876f1d9e Author: Greg Kurz <groug> Date: Wed May 22 15:43:46 2019 +0200 spapr: Don't migrate the hpt_maxpagesize cap to older machine types Commit 0b8c89be7f7b added the hpt_maxpagesize capability to the migration stream. This is okay for new machine types but it breaks backward migration to older QEMUs, which don't expect the extra subsection. Add a compatibility boolean flag to the sPAPR machine class and use it to skip migration of the capability for machine types 4.0 and older. This fixes migration to an older QEMU. Note that the destination will emit a warning: qemu-system-ppc64: warning: cap-hpt-max-page-size lower level (16) in incoming stream than on destination (24) This is expected and harmless though. It is okay to migrate from a lower HPT maximum page size (64k) to a greater one (16M). Fixes: 0b8c89be7f7b "spapr: Add forgotten capability to migration stream" Based-on: <20190522074016.10521-3-clg> Signed-off-by: Greg Kurz <groug> Message-Id: <155853262675.1158324.17301777846476373459.stgit> Signed-off-by: David Gibson <david.id.au> As BZ 1744170 has been moved to POST and commit 3725ef1a944b "spapr: Don't migrate the hpt_maxpagesize cap to older machine types" is part of the machine I move also this BZ to POST. This will allow to retest the package once the patch is merged. ####Reproduced the bug on following hosts and with the same steps as that in the bug description part: Host A: P9(alt7.6z) Host kernel: 4.14.0-115.8.2.el7a.ppc64le Qemu: qemu-kvm-ma-2.12.0-18.el7_6.4.ppc64le SLOF: SLOF-20171214-2.gitfa98132.el7.noarch Host B: P9(8.1.0-av) Host kernel: 4.18.0-137.el8.ppc64le Qemu: qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le SLOF: SLOF-20190703-1.gitba1ab360.module+el8.1.0+3730+7d905127.noarch [root@ibm-p9wr-18 home]# /usr/libexec/qemu-kvm -M pseries-rhel7.6.0 -nodefaults -monitor stdio -incoming tcp:0:5801 QEMU 2.12.0 monitor - type 'help' for more information (qemu) VNC server running on ::1:5900 (qemu) (qemu) qemu-kvm: error while loading state for instance 0x0 of device 'spapr' qemu-kvm: load of migration failed: No such file or directory ####Verified the bug on the same hosts but different qemu version on host B: Host B: P9(8.1.0-av) Host kernel: 4.18.0-137.el8.ppc64le Qemu: qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le SLOF: SLOF-20190703-1.gitba1ab360.module+el8.1.0+3730+7d905127.noarch ####BTW, I have tried to verify the bug on a P8(8.1.0-av) host as host B on qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le, it was also a failure to reproduce the bug there. Host B1: P8(8.1.0-av) Host kernel: 4.18.0-136.el8.ppc64le Guest kernel: 4.18.0-137.el8.ppc64le Qemu: qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le SLOF: SLOF-20190703-1.gitba1ab360.module+el8.1.0+3730+7d905127.noarch ####Conclusion: Based on above test result, the bug is fixed well on qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le Thanks for the feedback, close it since the issue is fixed. Please correct me if something is wrong. |