Bug 2189423
| Summary: | Failed to migrate VM from rhel 9.3 to rhel 9.2 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Min Deng <mdeng> |
| Component: | qemu-kvm | Assignee: | Leonardo Bras <leobras> |
| qemu-kvm sub component: | Live Migration | QA Contact: | Min Deng <mdeng> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | coli, fjin, jinzhao, juzhang, leobras, lijin, meili, nilal, peterx, virt-maint |
| Version: | 9.3 | Keywords: | Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-8.0.0-5.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-11-07 08:27:12 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hmm, when I was looking the upstream backward migration issue over 8.0->7.2 which broke too, I found the recent PCI AER change might be the culprit. I've raised this issue here: https://lore.kernel.org/qemu-devel/ZEhzaWpNM+NvZCUw@x1n It's very possible it's the same issue here for downstream. The bug also reproduced between rhel 9.2 and rhel 9.0 RHEL 9.0 host kernel-5.14.0-303.el9.x86_64 qemu-kvm-6.2.0-11.el9_0.7.x86_64 RHEL 9.2 kernel-5.14.0-302.el9.x86_64 qemu-kvm-8.0.0-1.el9.x86_64 (In reply to Min Deng from comment #5) > The bug also reproduced between rhel 9.2 and rhel 9.0 The bug also reproduced between rhel 9.3 and rhel 9.0 > RHEL 9.0 host > kernel-5.14.0-303.el9.x86_64 > qemu-kvm-6.2.0-11.el9_0.7.x86_64 > RHEL 9.2 RHEL 9.3 > kernel-5.14.0-302.el9.x86_64 > qemu-kvm-8.0.0-1.el9.x86_64 Should be from rhel9.3 to rhel 9.0. Thanks. (In reply to Min Deng from comment #6) > (In reply to Min Deng from comment #5) > > The bug also reproduced between rhel 9.2 and rhel 9.0 > The bug also reproduced between rhel 9.3 and rhel 9.0 > > RHEL 9.0 host > > kernel-5.14.0-303.el9.x86_64 > > qemu-kvm-6.2.0-11.el9_0.7.x86_64 > > RHEL 9.2 > RHEL 9.3 > > kernel-5.14.0-302.el9.x86_64 > > qemu-kvm-8.0.0-1.el9.x86_64 > > Should be from rhel9.3 to rhel 9.0. Thanks. Oh, that makes sense: something introduced in 9.2->9.3 have broken migration Thanks for the testing! I will start debugging this soon. Upstream patch sent: https://patchwork.kernel.org/project/qemu-devel/list/?series=744531 Hi Leonardo Could you please help to set DTM/ITM for this bug ? Thanks Min Hi All, The issue has been reproduced between rhel8.6 to rhel9.3 RHEL 8.6 host 4.18.0-372.58.1.el8_6.x86_64 qemu-kvm-6.2.0-11.module+el8.6.0+18167+43cf40f3.8.x86_64 edk2-ovmf-20220126gitbb1bba3d77-2.el8_6.1.noarch RHEL 9.3 host 5.14.0-316.el9.x86_64 qemu-kvm-8.0.0-4.el9.x86_64 edk2-ovmf-20230301gitf80f052277c8-4.el9.noarch Test results qemu-kvm: warning: Machine type 'pc-q35-rhel8.5/4/3/2/....0' is deprecated: machine types for previous major releases are deprecated QEMU 8.0.0 monitor - type 'help' for more information (qemu) migrate_incoming tcp:[::]:4000 (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19 qemu-kvm: Failed to load PCIDevice:config qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port' qemu-kvm: load of migration failed: Invalid argument Notes, Except for rhel 8.6.0, the rest tests failed with other machine types. I have to say,the issue blocks almost all tests betwween RHEL8.x and RHEL9.x from QE side. Thanks The MR for this bz: https://gitlab.com/redhat/rhel/src/qemu-kvm/qemu-kvm/-/merge_requests/283 QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. (In reply to Min Deng from comment #10) > Hi All, > The issue has been reproduced between rhel8.6 to rhel9.3 > RHEL 8.6 host > 4.18.0-372.58.1.el8_6.x86_64 > qemu-kvm-6.2.0-11.module+el8.6.0+18167+43cf40f3.8.x86_64 > edk2-ovmf-20220126gitbb1bba3d77-2.el8_6.1.noarch > RHEL 9.3 host > 5.14.0-316.el9.x86_64 > qemu-kvm-8.0.0-4.el9.x86_64 > edk2-ovmf-20230301gitf80f052277c8-4.el9.noarch > > Test results > qemu-kvm: warning: Machine type 'pc-q35-rhel8.5/4/3/2/....0' is deprecated: > machine types for previous major releases are deprecated > QEMU 8.0.0 monitor - type 'help' for more information > (qemu) migrate_incoming tcp:[::]:4000 > (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 > device: 40 cmask: ff wmask: 0 w1cmask:19 > qemu-kvm: Failed to load PCIDevice:config > qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj > qemu-kvm: error while loading state for instance 0x0 of device > '0000:00:12.0/pcie-root-port' > qemu-kvm: load of migration failed: Invalid argument > > Notes, > Except for rhel 8.6.0, the rest tests failed with other machine types. I > have to say,the issue blocks almost all tests betwween RHEL8.x and RHEL9.x > from QE side. Thanks Hi Min, Thanks for highlighting the importance. Apologies for the delay; I recently came back from my PTOs. It looks like Leonardo already took care of the fix (thanks), so we should be good. However, if something is still missing, please let us know. QE tried the same steps to comment0 on the following builds SRC: RHEL 9.2 kernel-5.14.0-284.18.1.el9_2.x86_64 qemu-kvm-7.2.0-14.el9_2.1.x86_64 RHEL 9.3 kernel-5.14.0-325.el9.x86_64 qemu-kvm-8.0.0-5.el9.x86_64 The original issue has been fixed, thank you ! New bug Bug 2215819 - Stable guest abi test failed while guest is with machine type lower than rhel 8.6.0 (not including 8. Per Leonardo, again, verified the bug on following build (qemu-kvm-8.0.0-10.el9.x86_64), the original issue has gone. SRC:RHEL 9.3 kernel-5.14.0-348.el9.x86_64 qemu-kvm-8.0.0-10.el9.x86_64 DST:RHEL 9.2 5.14.0-284.26.1.el9_2.x86_64 qemu-kvm-7.2.0-14.el9_2.3.x86_64 Steps, please refer to Description Actual results Migration passed Expected results Migration pass Thank you ! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6368 |
Description of problem: Failed to migrate VM from rhel 9.3 to rhel 9.2 Version-Release number of selected component (if applicable): RHEL 9.3 [root@dell-per750-39 home]# uname -r 5.14.0-300.el9.x86_64 [root@dell-per750-39 home]# rpm -qa|grep qemu-kvm qemu-kvm-common-8.0.0-1.el9.x86_64 qemu-kvm-device-display-virtio-gpu-8.0.0-1.el9.x86_64 qemu-kvm-ui-opengl-8.0.0-1.el9.x86_64 qemu-kvm-ui-egl-headless-8.0.0-1.el9.x86_64 qemu-kvm-device-display-virtio-gpu-pci-8.0.0-1.el9.x86_64 qemu-kvm-audio-pa-8.0.0-1.el9.x86_64 qemu-kvm-block-rbd-8.0.0-1.el9.x86_64 qemu-kvm-device-display-virtio-vga-8.0.0-1.el9.x86_64 qemu-kvm-device-usb-host-8.0.0-1.el9.x86_64 qemu-kvm-device-usb-redirect-8.0.0-1.el9.x86_64 qemu-kvm-tools-8.0.0-1.el9.x86_64 qemu-kvm-docs-8.0.0-1.el9.x86_64 qemu-kvm-core-8.0.0-1.el9.x86_64 qemu-kvm-8.0.0-1.el9.x86_64 RHEL 9.2 host [root@lenovo-sr630-05 rhel920]# uname -r 5.14.0-268.el9.x86_64 [root@lenovo-sr630-05 rhel920]# rpm -qa|grep qemu-kvm qemu-kvm-common-7.2.0-14.el9_2.x86_64 qemu-kvm-device-display-virtio-gpu-7.2.0-14.el9_2.x86_64 qemu-kvm-ui-opengl-7.2.0-14.el9_2.x86_64 qemu-kvm-ui-egl-headless-7.2.0-14.el9_2.x86_64 qemu-kvm-device-display-virtio-gpu-pci-7.2.0-14.el9_2.x86_64 qemu-kvm-audio-pa-7.2.0-14.el9_2.x86_64 qemu-kvm-block-rbd-7.2.0-14.el9_2.x86_64 qemu-kvm-device-display-virtio-vga-7.2.0-14.el9_2.x86_64 qemu-kvm-device-usb-host-7.2.0-14.el9_2.x86_64 qemu-kvm-device-usb-redirect-7.2.0-14.el9_2.x86_64 qemu-kvm-tools-7.2.0-14.el9_2.x86_64 qemu-kvm-docs-7.2.0-14.el9_2.x86_64 qemu-kvm-core-7.2.0-14.el9_2.x86_64 qemu-kvm-7.2.0-14.el9_2.x86_64 How reproducible: 5/5 Steps to Reproduce: 1.Boot up VM on RHEL 9.3 host /usr/libexec/qemu-kvm -name guest=rhel,debug-threads=on -machine pc-q35-rhel9.2.0,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,hpet=off,acpi=on -accel kvm -cpu Nehalem -global driver=cfi.pflash01,property=secure,value=on -m 1536 -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":1610612736}' -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7881821b-07fd-4dbf-9d5a-12df440ff75e -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' -device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' -device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' -device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' -device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' -device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' -device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' -device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' -device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' -device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' -device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' -device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' -device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' -device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' -device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' -chardev pty,id=charserial0 -device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' -device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' -audiodev '{"id":"audio1","driver":"none"}' -vnc 127.0.0.1:0,audiodev=audio1 -device '{"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"}' -global ICH9-LPC.noreboot=off -watchdog-action reset -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' -object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' -device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on -monitor stdio 2.migrate it to RHEL 9.2 host 3. Actual results: [root@lenovo-sr630-05 rhel920]# sh dst_short.sh char device redirected to /dev/pts/5 (label charserial0) QEMU 7.2.0 monitor - type 'help' for more information (qemu) migrate_incoming tcp:[::]:4000 (qemu) 2023-04-25T07:21:59.877733Z qemu-kvm: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 cmask: ff wmask: 0 w1cmask:0 2023-04-25T07:21:59.877803Z qemu-kvm: Failed to load PCIDevice:config 2023-04-25T07:21:59.877829Z qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj 2023-04-25T07:21:59.877849Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:02.0/pcie-root-port' 2023-04-25T07:21:59.878250Z qemu-kvm: load of migration failed: Invalid argument Expected results: Should work well. Additional info: The issue can be triggered by libvirt so please help to check it, thanks.