Description of problem: When do backward migration from rhel8.1.0(qemu 4.1) to alt7.6(qemu-kvm-rhev-2.12.0), migration failed with "(qemu) qemu-kvm: error while loading state for instance 0x0 of device 'spapr'", it happens both on p9<->p9 and p8<->p9. Version-Release number of selected component (if applicable): Host A p9(alt7.6): 4.14.0-115.8.2.el7a.ppc64le qemu-kvm-rhev-2.12.0-18.el7_6.7.ppc64le SLOF-20171214-2.gitfa98132.el7.noarch Host B p8(rhel8.1.0 fast train): 4.18.0-130.el8.ppc64le qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le SLOF-20190114-2.gita5b428e.module+el8.1.0+3554+1a3a94a6.noarch How reproducible: 100% Steps to Reproduce: 1.Boot a guest on host A: /usr/libexec/qemu-kvm -M pseries-rhel7.6.0 -nodefaults -monitor stdio (qemu) info qtree bus: main-system-bus type System dev: spapr-pci-host-bridge, id "" index = 0 (0x0) mem_win_size = 2147483648 (0x80000000) mem64_win_size = 1099511627776 (0x10000000000) io_win_size = 65536 (0x10000) dynamic-reconfiguration = true dma_win_addr = 0 (0x0) dma_win_size = 1073741824 (0x40000000) dma64_win_addr = 576460752303423488 (0x800000000000000) ddw = true pgsz = 69632 (0x11000) numa_node = 4294967295 (0xffffffff) pre-2.8-migration = false pcie-extended-configuration-space = true bus: pci.0 type PCI dev: spapr-vio-bridge, id "" bus: spapr-vio type spapr-vio-bus dev: spapr-nvram, id "nvram@71000000" reg = 1895825408 (0x71000000) drive = "" irq = 4098 (0x1002) 2.Boot incoming guest on host B: /usr/libexec/qemu-kvm -M pseries-rhel7.6.0 -nodefaults -monitor stdio -incoming tcp:0:5801 (qemu) info qtree bus: main-system-bus type System dev: spapr-pci-host-bridge, id "" index = 0 (0x0) mem_win_size = 2147483648 (0x80000000) mem64_win_size = 1099511627776 (0x10000000000) io_win_size = 65536 (0x10000) dynamic-reconfiguration = true dma_win_addr = 0 (0x0) dma_win_size = 1073741824 (0x40000000) dma64_win_addr = 576460752303423488 (0x800000000000000) ddw = true pgsz = 69632 (0x11000) numa_node = 4294967295 (0xffffffff) pre-2.8-migration = false pcie-extended-configuration-space = true gpa = 70368744177664 (0x400000000000) atsd = 140737488355328 (0x800000000000) bus: pci.0 type PCI dev: spapr-vio-bridge, id "" bus: spapr-vio type spapr-vio-bus dev: spapr-nvram, id "nvram@71000000" reg = 1895825408 (0x71000000) drive = "" 3.Do forward migration from host A to host B (qemu) migrate -d tcp:10.16.200.238:5801 migration completed and vm running on hostB. 4.Boot incoming guest on host A: /usr/libexec/qemu-kvm -M pseries-rhel7.6.0 -nodefaults -monitor stdio -incoming tcp:0:5801 5.Do backward migration from host B to host A (qemu) migrate -d tcp:10.19.128.145:5801 Actual results: Migration completed on host B, but qemu crash on host A on host A: (qemu) qemu-kvm: error while loading state for instance 0x0 of device 'spapr' qemu-kvm: load of migration failed: No such file or directory Expected results: Forward and backward migration both work well. Additional info: This issue happens both for following build P9<->P9: Host A p9(alt7.6): 4.14.0-115.8.2.el7a.ppc64le qemu-kvm-rhev-2.12.0-18.el7_6.7.ppc64le Host B p8(rhel8.1.0 fast train): 4.18.0-134.el8.ppc64le qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.ppc64le
This is ppc only bug
I. This is a regression bug on qemu4.1, if destination build is qemu4.0, it works well, i.e, it works well on below build: P9<->P9: Host A p9(alt7.6): 4.14.0-115.8.2.el7a.ppc64le qemu-kvm-rhev-2.12.0-18.el7_6.7.ppc64le Host B p9(rhel8.1.0 fast train): 4.18.0-134.el8.ppc64le qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.ppc64le In fact, this issue is fixed on qemu4.0 as following bz: https://bugzilla.redhat.com/show_bug.cgi?id=1709726
This issue is hit when do migration from qemu4.1-->qemu4.0 and qemu4.1-->qemu3.1 as following: I: Host A p8 (qemu4.1): 4.18.0-130.el8.ppc64le qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le Host B p9 (qemu4.0): 4.18.0-134.el8.ppc64le qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.ppc64le qemu cli: /usr/libexec/qemu-kvm -nodefaults -monitor stdio -machine pseries-rhel8.1.0,max-cpu-compat=power8 result is same with bug report II: Host A p8 (qemu4.1): 4.18.0-130.el8.ppc64le qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le Host B p9 (qemu3.1): 4.18.0-134.el8.ppc64le qemu-kvm-3.1.0-30.module+el8.0.1+3755+6782b0ed.ppc64le qemu cli: /usr/libexec/qemu-kvm -nodefaults -monitor stdio -machine pseries-rhel7.6.0,max-cpu-compat=power8
Laurent, Assigning to you, it may or may not be related to the migration bugs you're already looking at.
This issue is also hit when do migration from power8 with rhel8.1.0(qemu4.1) to power8 with rhel7.6.z(qemu-kvm-rhev-2.12). source p8: 4.18.0-130.el8.ppc64le qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le SLOF-20190114-2.gita5b428e.module+el8.1.0+3554+1a3a94a6.noarch destination p8: 3.10.0-957.35.1.el7.ppc64le qemu-kvm-rhev-2.12.0-18.el7_6.7.ppc64le SLOF-20171214-2.gitfa98132.el7.noarch
Thank you xianwang. In the new machine type (BZ 1744170) I think the problem is fixed by: commit 3725ef1a944bbe1173b55fdabe76fb17876f1d9e Author: Greg Kurz <groug> Date: Wed May 22 15:43:46 2019 +0200 spapr: Don't migrate the hpt_maxpagesize cap to older machine types Commit 0b8c89be7f7b added the hpt_maxpagesize capability to the migration stream. This is okay for new machine types but it breaks backward migration to older QEMUs, which don't expect the extra subsection. Add a compatibility boolean flag to the sPAPR machine class and use it to skip migration of the capability for machine types 4.0 and older. This fixes migration to an older QEMU. Note that the destination will emit a warning: qemu-system-ppc64: warning: cap-hpt-max-page-size lower level (16) in incoming stream than on destination (24) This is expected and harmless though. It is okay to migrate from a lower HPT maximum page size (64k) to a greater one (16M). Fixes: 0b8c89be7f7b "spapr: Add forgotten capability to migration stream" Based-on: <20190522074016.10521-3-clg> Signed-off-by: Greg Kurz <groug> Message-Id: <155853262675.1158324.17301777846476373459.stgit> Signed-off-by: David Gibson <david.id.au>
As BZ 1744170 has been moved to POST and commit 3725ef1a944b "spapr: Don't migrate the hpt_maxpagesize cap to older machine types" is part of the machine I move also this BZ to POST. This will allow to retest the package once the patch is merged.
####Reproduced the bug on following hosts and with the same steps as that in the bug description part: Host A: P9(alt7.6z) Host kernel: 4.14.0-115.8.2.el7a.ppc64le Qemu: qemu-kvm-ma-2.12.0-18.el7_6.4.ppc64le SLOF: SLOF-20171214-2.gitfa98132.el7.noarch Host B: P9(8.1.0-av) Host kernel: 4.18.0-137.el8.ppc64le Qemu: qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le SLOF: SLOF-20190703-1.gitba1ab360.module+el8.1.0+3730+7d905127.noarch [root@ibm-p9wr-18 home]# /usr/libexec/qemu-kvm -M pseries-rhel7.6.0 -nodefaults -monitor stdio -incoming tcp:0:5801 QEMU 2.12.0 monitor - type 'help' for more information (qemu) VNC server running on ::1:5900 (qemu) (qemu) qemu-kvm: error while loading state for instance 0x0 of device 'spapr' qemu-kvm: load of migration failed: No such file or directory ####Verified the bug on the same hosts but different qemu version on host B: Host B: P9(8.1.0-av) Host kernel: 4.18.0-137.el8.ppc64le Qemu: qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le SLOF: SLOF-20190703-1.gitba1ab360.module+el8.1.0+3730+7d905127.noarch ####BTW, I have tried to verify the bug on a P8(8.1.0-av) host as host B on qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le, it was also a failure to reproduce the bug there. Host B1: P8(8.1.0-av) Host kernel: 4.18.0-136.el8.ppc64le Guest kernel: 4.18.0-137.el8.ppc64le Qemu: qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le SLOF: SLOF-20190703-1.gitba1ab360.module+el8.1.0+3730+7d905127.noarch ####Conclusion: Based on above test result, the bug is fixed well on qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le
Thanks for the feedback, close it since the issue is fixed. Please correct me if something is wrong.