Bug 1744107 - Migration from P8(qemu4.1) to P9(qemu4.1), after migration, qemu crash on destination with error message "qemu-kvm: error while loading state for instance 0x1 of device 'cpu'"
Summary: Migration from P8(qemu4.1) to P9(qemu4.1), after migration, qemu crash on des...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.1
Hardware: ppc64le
OS: Linux
high
high
Target Milestone: rc
: 8.1
Assignee: Laurent Vivier
QA Contact: Gu Nini
URL:
Whiteboard:
Depends On: 1744170
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-21 11:07 UTC by xianwang
Modified: 2019-11-06 07:19 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-4.1.0-7.module+el8.1.0+4177+896cb282
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-06 07:19:01 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 0 None None None 2019-11-06 07:19:24 UTC

Description xianwang 2019-08-21 11:07:06 UTC
Description of problem:
Migration P8(qemu4.1) --> P9(qemu4.1), after migration, on source host migration status is "completed" and vm is "paused (postmigrate)", but qemu crash on destination with error message "qemu-kvm: error while loading state for instance 0x1 of device 'cpu'" . 

Version-Release number of selected component (if applicable):
source host P8(qemu4.1):
4.18.0-136.el8.ppc64le
qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le
SLOF-20190114-2.gita5b428e.module+el8.1.0+3554+1a3a94a6.noarch

destination host P9(qemu4.1):
4.18.0-136.el8.ppc64le
qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le
SLOF-20190703-1.gitba1ab360.module+el8.1.0+3730+7d905127.noarch

Guest:
4.18.0-136.el8.ppc64le

How reproducible:
90%

Steps to Reproduce:
1.Boot a guest on source(p8) host with qemu cli:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -nodefaults  \
    -machine pseries-rhel8.1.0,max-cpu-compat=power8 \
    -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -chardev socket,id=console0,path=/tmp/console0,server,nowait \
    -device spapr-vty,chardev=console0,reg=0x30000000 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x5 \
    -device pci-bridge,chassis_nr=1,id=bridge1,bus=pci.0,addr=0x6 \
    -device pci-bridge,chassis_nr=2,id=bridge2,bus=pci.0,addr=0x8 \
    -device virtio-scsi-pci,id=scsi1,bus=bridge1,addr=0x7 \
    -drive file=/home/xianwang/rhel810-ppc64le-virtio-scsi.qcow2.bak,format=qcow2,if=none,cache=none,id=drive_scsi1,werror=stop,rerror=stop \
    -device scsi-hd,drive=drive_scsi1,id=scsi-disk1,bus=scsi1.0,channel=0,scsi-id=0x6,lun=0x3,bootindex=0 \
    -device virtio-scsi-pci,id=scsi_add,bus=pci.0,addr=0x9 \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:72,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=0xa \
    -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 2048,slots=4,maxmem=32G \
    -smp 4 \
    -vga std \
    -vnc :11 \
    -cpu host \
    -device usb-kbd \
    -device usb-mouse \
    -qmp tcp:0:8881,server,nowait \
    -msg timestamp=on \
    -rtc base=localtime,clock=vm,driftfix=slew  \
    -monitor stdio \
    -boot order=cdn,once=n,menu=on,strict=off \
    -enable-kvm \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xc \
    -device i6300esb,id=wdt0 \
    -watchdog-action pause \
2.Boot a incoming mode on destination host(p9)   
3.Do migration on source host:
(qemu) migrate -d tcp:10.19.128.149:5801

Actual results:
Migration completed and vm paused on source host, but qemu crash on destination
source:
(qemu) info migrate
Migration status: completed
(qemu) info status 
VM status: paused (postmigrate)

destination host:
(qemu) 2019-08-21T10:49:48.210071Z qemu-kvm: error while loading state for instance 0x1 of device 'cpu'
2019-08-21T10:49:48.211072Z qemu-kvm: load of migration failed: Operation not permitted


Expected results:
Migration completed and vm works well on destination.

Additional info:

Comment 1 xianwang 2019-08-21 11:22:42 UTC
I.
This bug is ppc only bug.

II.
This bug is qemu4.1 regression, it does not exist on qemu4.0, i.e, it works well on 
qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.ppc64le

III.
I have tried "ppc64_cpu --smt=off", but test result is same with "smt=on", it is not related to this configuration. 
# ppc64_cpu --smt
SMT is off 

III.
I have tried to add a parameter on machine type as "-machine pseries-rhel8.1.0,max-cpu-compat=power8,ic-mode=xics" on both source end and destination end, but I still could hit this issue. 

IV.
If boot this guest on P9 directly with qemu cli as source end in bug report, guest  could boot successfully and works well.

Comment 2 xianwang 2019-08-22 01:55:30 UTC
Even I use 8.0.0 machine type or 7.6.0 machine type as "-machine pseries-rhel8.0.0,max-cpu-compat=power8" "-machine pseries-rhel7.6.0,max-cpu-compat=power8", I also could hit this issue.

Comment 4 Laurent Vivier 2019-08-22 17:07:09 UTC
I'm able to reproduce the problem with updated RPM and with upstream qemu, so the problem is not fixed.

Comment 5 Laurent Vivier 2019-08-22 17:57:59 UTC
Bisected to:

commit 25c9780d38d4494f8610371d883865cf40b35dd6
Author: David Gibson <david.id.au>
Date:   Tue Aug 13 15:59:18 2019 +1000

    spapr: Reset CAS & IRQ subsystem after devices
    
    This fixes a nasty regression in qemu-4.1 for the 'pseries' machine,
    caused by the new "dual" interrupt controller model.  Specifically,
    qemu can crash when used with KVM if a 'system_reset' is requested
    while there's active I/O in the guest.
    
    The problem is that in spapr_machine_reset() we:
    
    1. Reset the CAS vector state
            spapr_ovec_cleanup(spapr->ov5_cas);
    
    2. Reset all devices
            qemu_devices_reset()
    
    3. Reset the irq subsystem
            spapr_irq_reset();
    
    However (1) implicitly changes the interrupt delivery mode, because
    whether we're using XICS or XIVE depends on the CAS state.  We don't
    properly initialize the new irq mode until (3) though - in particular
    setting up the KVM devices.
    
    During (2), we can temporarily drop the BQL allowing some irqs to be
    delivered which will go to an irq system that's not properly set up.
    
    Specifically, if the previous guest was in (KVM) XIVE mode, the CAS
    reset will put us back in XICS mode.  kvm_kernel_irqchip() still
    returns true, because XIVE was using KVM, however XICs doesn't have
    its KVM components intialized and kernel_xics_fd == -1.  When the irq
    is delivered it goes via ics_kvm_set_irq() which assert()s that
    kernel_xics_fd != -1.
    
    This change addresses the problem by delaying the CAS reset until
    after the devices reset.  The device reset should quiesce all the
    devices so we won't get irqs delivered while we mess around with the
    IRQ.  The CAS reset and irq re-initialize should also now be under the
    same BQL critical section so nothing else should be able to interrupt
    it either.
    
    We also move the spapr_irq_msi_reset() used in one of the legacy irq
    modes, since it logically makes sense at the same point as the
    spapr_irq_reset() (it's essentially an equivalent operation for older
    machine types).  Since we don't need to switch between different
    interrupt controllers for those old machine types it shouldn't
    actually be broken in those cases though.
    
    Cc: Cédric Le Goater <clg>
    
    Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend"
    Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting
                     XIVE and XICS"
    Signed-off-by: David Gibson <david.id.au>

Comment 6 Laurent Vivier 2019-08-23 12:21:39 UTC
It seems the side effect of patch in comment 5 is to add a supplementary field compat_pvr for each CPU in the migration stream:

        {
            "name": "cpu",
            "instance_id": 0,
            "vmsd_name": "cpu",
            "version": 5,
...
            "subsections": [
...
                {
                    "vmsd_name": "cpu/compat",
                    "version": 1,
                    "fields": [
                        {
                            "name": "compat_pvr",
                            "type": "uint32",
                            "size": 4
                        }
                    ]
                }
            ]
        },
...

Comment 7 Laurent Vivier 2019-08-23 13:42:40 UTC
What seems to happen is compat_pvr is not propagated correctly to all CPUs.

Originally, spapr_machine_reset() call ppc_set_compat() to set the value max_compat_pvr for the first cpu and this was propagated to all CPU by spapr_cpu_reset().
Now, as spapr_cpu_reset() is called before that, the value is not propagated to all CPUs.

A simple fix seems to be:

--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1752,7 +1752,7 @@ static void spapr_machine_reset(MachineState *machine)
         spapr_ovec_cleanup(spapr->ov5_cas);
         spapr->ov5_cas = spapr_ovec_new();
 
-        ppc_set_compat(first_ppc_cpu, spapr->max_compat_pvr, &error_fatal);
+        ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal);
     }
 
     /*

I've release the P9 machine, so I can't test it for the moment.

Comment 14 Laurent Vivier 2019-08-28 10:23:43 UTC
Patch from comment 7 is in David's next pull request queue:

https://github.com/dgibson/qemu/commits/ppc-for-4.2

https://github.com/dgibson/qemu/commit/5eb7e73394317b225f8b941eff65dc6f9045bcde

I will backport it as soon as it will be merged

Comment 16 Danilo de Paula 2019-09-04 14:12:32 UTC
@Martin: Looks like the system didn't grant pm_ack automatically. Would you mind?

Comment 22 errata-xmlrpc 2019-11-06 07:19:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.