Description of problem: Migration P8(qemu4.1) --> P9(qemu4.1), after migration, on source host migration status is "completed" and vm is "paused (postmigrate)", but qemu crash on destination with error message "qemu-kvm: error while loading state for instance 0x1 of device 'cpu'" . Version-Release number of selected component (if applicable): source host P8(qemu4.1): 4.18.0-136.el8.ppc64le qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le SLOF-20190114-2.gita5b428e.module+el8.1.0+3554+1a3a94a6.noarch destination host P9(qemu4.1): 4.18.0-136.el8.ppc64le qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le SLOF-20190703-1.gitba1ab360.module+el8.1.0+3730+7d905127.noarch Guest: 4.18.0-136.el8.ppc64le How reproducible: 90% Steps to Reproduce: 1.Boot a guest on source(p8) host with qemu cli: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -nodefaults \ -machine pseries-rhel8.1.0,max-cpu-compat=power8 \ -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=0x3 \ -object iothread,id=iothread0 \ -chardev socket,id=console0,path=/tmp/console0,server,nowait \ -device spapr-vty,chardev=console0,reg=0x30000000 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x5 \ -device pci-bridge,chassis_nr=1,id=bridge1,bus=pci.0,addr=0x6 \ -device pci-bridge,chassis_nr=2,id=bridge2,bus=pci.0,addr=0x8 \ -device virtio-scsi-pci,id=scsi1,bus=bridge1,addr=0x7 \ -drive file=/home/xianwang/rhel810-ppc64le-virtio-scsi.qcow2.bak,format=qcow2,if=none,cache=none,id=drive_scsi1,werror=stop,rerror=stop \ -device scsi-hd,drive=drive_scsi1,id=scsi-disk1,bus=scsi1.0,channel=0,scsi-id=0x6,lun=0x3,bootindex=0 \ -device virtio-scsi-pci,id=scsi_add,bus=pci.0,addr=0x9 \ -device virtio-net-pci,mac=9a:7b:7c:7d:7e:72,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=0xa \ -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -m 2048,slots=4,maxmem=32G \ -smp 4 \ -vga std \ -vnc :11 \ -cpu host \ -device usb-kbd \ -device usb-mouse \ -qmp tcp:0:8881,server,nowait \ -msg timestamp=on \ -rtc base=localtime,clock=vm,driftfix=slew \ -monitor stdio \ -boot order=cdn,once=n,menu=on,strict=off \ -enable-kvm \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xc \ -device i6300esb,id=wdt0 \ -watchdog-action pause \ 2.Boot a incoming mode on destination host(p9) 3.Do migration on source host: (qemu) migrate -d tcp:10.19.128.149:5801 Actual results: Migration completed and vm paused on source host, but qemu crash on destination source: (qemu) info migrate Migration status: completed (qemu) info status VM status: paused (postmigrate) destination host: (qemu) 2019-08-21T10:49:48.210071Z qemu-kvm: error while loading state for instance 0x1 of device 'cpu' 2019-08-21T10:49:48.211072Z qemu-kvm: load of migration failed: Operation not permitted Expected results: Migration completed and vm works well on destination. Additional info:
I. This bug is ppc only bug. II. This bug is qemu4.1 regression, it does not exist on qemu4.0, i.e, it works well on qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.ppc64le III. I have tried "ppc64_cpu --smt=off", but test result is same with "smt=on", it is not related to this configuration. # ppc64_cpu --smt SMT is off III. I have tried to add a parameter on machine type as "-machine pseries-rhel8.1.0,max-cpu-compat=power8,ic-mode=xics" on both source end and destination end, but I still could hit this issue. IV. If boot this guest on P9 directly with qemu cli as source end in bug report, guest could boot successfully and works well.
Even I use 8.0.0 machine type or 7.6.0 machine type as "-machine pseries-rhel8.0.0,max-cpu-compat=power8" "-machine pseries-rhel7.6.0,max-cpu-compat=power8", I also could hit this issue.
I'm able to reproduce the problem with updated RPM and with upstream qemu, so the problem is not fixed.
Bisected to: commit 25c9780d38d4494f8610371d883865cf40b35dd6 Author: David Gibson <david.id.au> Date: Tue Aug 13 15:59:18 2019 +1000 spapr: Reset CAS & IRQ subsystem after devices This fixes a nasty regression in qemu-4.1 for the 'pseries' machine, caused by the new "dual" interrupt controller model. Specifically, qemu can crash when used with KVM if a 'system_reset' is requested while there's active I/O in the guest. The problem is that in spapr_machine_reset() we: 1. Reset the CAS vector state spapr_ovec_cleanup(spapr->ov5_cas); 2. Reset all devices qemu_devices_reset() 3. Reset the irq subsystem spapr_irq_reset(); However (1) implicitly changes the interrupt delivery mode, because whether we're using XICS or XIVE depends on the CAS state. We don't properly initialize the new irq mode until (3) though - in particular setting up the KVM devices. During (2), we can temporarily drop the BQL allowing some irqs to be delivered which will go to an irq system that's not properly set up. Specifically, if the previous guest was in (KVM) XIVE mode, the CAS reset will put us back in XICS mode. kvm_kernel_irqchip() still returns true, because XIVE was using KVM, however XICs doesn't have its KVM components intialized and kernel_xics_fd == -1. When the irq is delivered it goes via ics_kvm_set_irq() which assert()s that kernel_xics_fd != -1. This change addresses the problem by delaying the CAS reset until after the devices reset. The device reset should quiesce all the devices so we won't get irqs delivered while we mess around with the IRQ. The CAS reset and irq re-initialize should also now be under the same BQL critical section so nothing else should be able to interrupt it either. We also move the spapr_irq_msi_reset() used in one of the legacy irq modes, since it logically makes sense at the same point as the spapr_irq_reset() (it's essentially an equivalent operation for older machine types). Since we don't need to switch between different interrupt controllers for those old machine types it shouldn't actually be broken in those cases though. Cc: Cédric Le Goater <clg> Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend" Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS" Signed-off-by: David Gibson <david.id.au>
It seems the side effect of patch in comment 5 is to add a supplementary field compat_pvr for each CPU in the migration stream: { "name": "cpu", "instance_id": 0, "vmsd_name": "cpu", "version": 5, ... "subsections": [ ... { "vmsd_name": "cpu/compat", "version": 1, "fields": [ { "name": "compat_pvr", "type": "uint32", "size": 4 } ] } ] }, ...
What seems to happen is compat_pvr is not propagated correctly to all CPUs. Originally, spapr_machine_reset() call ppc_set_compat() to set the value max_compat_pvr for the first cpu and this was propagated to all CPU by spapr_cpu_reset(). Now, as spapr_cpu_reset() is called before that, the value is not propagated to all CPUs. A simple fix seems to be: --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1752,7 +1752,7 @@ static void spapr_machine_reset(MachineState *machine) spapr_ovec_cleanup(spapr->ov5_cas); spapr->ov5_cas = spapr_ovec_new(); - ppc_set_compat(first_ppc_cpu, spapr->max_compat_pvr, &error_fatal); + ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal); } /* I've release the P9 machine, so I can't test it for the moment.
Patch from comment 7 is in David's next pull request queue: https://github.com/dgibson/qemu/commits/ppc-for-4.2 https://github.com/dgibson/qemu/commit/5eb7e73394317b225f8b941eff65dc6f9045bcde I will backport it as soon as it will be merged
@Martin: Looks like the system didn't grant pm_ack automatically. Would you mind?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723