Bug 2189423

Summary: Failed to migrate VM from rhel 9.3 to rhel 9.2
Product: Red Hat Enterprise Linux 9 Reporter: Min Deng <mdeng>
Component: qemu-kvmAssignee: Leonardo Bras <leobras>
qemu-kvm sub component: Live Migration QA Contact: Min Deng <mdeng>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: coli, fjin, jinzhao, juzhang, leobras, lijin, meili, nilal, peterx, virt-maint
Version: 9.3Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-8.0.0-5.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:27:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Min Deng 2023-04-25 07:39:23 UTC
Description of problem:
Failed to migrate VM from rhel 9.3 to rhel 9.2 

Version-Release number of selected component (if applicable):
RHEL 9.3 
[root@dell-per750-39 home]# uname -r
5.14.0-300.el9.x86_64
[root@dell-per750-39 home]# rpm -qa|grep qemu-kvm
qemu-kvm-common-8.0.0-1.el9.x86_64
qemu-kvm-device-display-virtio-gpu-8.0.0-1.el9.x86_64
qemu-kvm-ui-opengl-8.0.0-1.el9.x86_64
qemu-kvm-ui-egl-headless-8.0.0-1.el9.x86_64
qemu-kvm-device-display-virtio-gpu-pci-8.0.0-1.el9.x86_64
qemu-kvm-audio-pa-8.0.0-1.el9.x86_64
qemu-kvm-block-rbd-8.0.0-1.el9.x86_64
qemu-kvm-device-display-virtio-vga-8.0.0-1.el9.x86_64
qemu-kvm-device-usb-host-8.0.0-1.el9.x86_64
qemu-kvm-device-usb-redirect-8.0.0-1.el9.x86_64
qemu-kvm-tools-8.0.0-1.el9.x86_64
qemu-kvm-docs-8.0.0-1.el9.x86_64
qemu-kvm-core-8.0.0-1.el9.x86_64
qemu-kvm-8.0.0-1.el9.x86_64

RHEL 9.2 host
[root@lenovo-sr630-05 rhel920]# uname -r
5.14.0-268.el9.x86_64
[root@lenovo-sr630-05 rhel920]# rpm -qa|grep qemu-kvm
qemu-kvm-common-7.2.0-14.el9_2.x86_64
qemu-kvm-device-display-virtio-gpu-7.2.0-14.el9_2.x86_64
qemu-kvm-ui-opengl-7.2.0-14.el9_2.x86_64
qemu-kvm-ui-egl-headless-7.2.0-14.el9_2.x86_64
qemu-kvm-device-display-virtio-gpu-pci-7.2.0-14.el9_2.x86_64
qemu-kvm-audio-pa-7.2.0-14.el9_2.x86_64
qemu-kvm-block-rbd-7.2.0-14.el9_2.x86_64
qemu-kvm-device-display-virtio-vga-7.2.0-14.el9_2.x86_64
qemu-kvm-device-usb-host-7.2.0-14.el9_2.x86_64
qemu-kvm-device-usb-redirect-7.2.0-14.el9_2.x86_64
qemu-kvm-tools-7.2.0-14.el9_2.x86_64
qemu-kvm-docs-7.2.0-14.el9_2.x86_64
qemu-kvm-core-7.2.0-14.el9_2.x86_64
qemu-kvm-7.2.0-14.el9_2.x86_64

How reproducible:
5/5

Steps to Reproduce:
1.Boot up VM on RHEL 9.3 host
/usr/libexec/qemu-kvm -name guest=rhel,debug-threads=on -machine pc-q35-rhel9.2.0,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,hpet=off,acpi=on -accel kvm -cpu Nehalem -global driver=cfi.pflash01,property=secure,value=on -m 1536 -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":1610612736}' -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7881821b-07fd-4dbf-9d5a-12df440ff75e -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' -device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' -device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' -device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' -device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' -device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' -device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' -device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' -device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' -device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' -device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' -device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' -device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' -device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' -device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' -chardev pty,id=charserial0 -device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' -device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' -audiodev '{"id":"audio1","driver":"none"}' -vnc 127.0.0.1:0,audiodev=audio1 -device '{"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"}' -global ICH9-LPC.noreboot=off -watchdog-action reset -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' -object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' -device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on -monitor stdio

2.migrate it to RHEL 9.2 host
3.
Actual results:
[root@lenovo-sr630-05 rhel920]# sh dst_short.sh 
char device redirected to /dev/pts/5 (label charserial0)
QEMU 7.2.0 monitor - type 'help' for more information
(qemu) migrate_incoming tcp:[::]:4000
(qemu) 2023-04-25T07:21:59.877733Z qemu-kvm: get_pci_config_device: Bad config data: i=0x10a read: 40 device: 0 cmask: ff wmask: 0 w1cmask:0
2023-04-25T07:21:59.877803Z qemu-kvm: Failed to load PCIDevice:config
2023-04-25T07:21:59.877829Z qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
2023-04-25T07:21:59.877849Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:02.0/pcie-root-port'
2023-04-25T07:21:59.878250Z qemu-kvm: load of migration failed: Invalid argument

Expected results:
Should work well.

Additional info:
The issue can be triggered by libvirt so please help to check it, thanks.

Comment 4 Peter Xu 2023-04-26 00:46:47 UTC
Hmm, when I was looking the upstream backward migration issue over 8.0->7.2 which broke too, I found the recent PCI AER change might be the culprit.

I've raised this issue here:

https://lore.kernel.org/qemu-devel/ZEhzaWpNM+NvZCUw@x1n

It's very possible it's the same issue here for downstream.

Comment 5 Min Deng 2023-04-27 05:34:12 UTC
The bug also reproduced between rhel 9.2 and rhel 9.0
RHEL 9.0 host
kernel-5.14.0-303.el9.x86_64
qemu-kvm-6.2.0-11.el9_0.7.x86_64
RHEL 9.2
kernel-5.14.0-302.el9.x86_64
qemu-kvm-8.0.0-1.el9.x86_64

Comment 6 Min Deng 2023-04-27 05:35:57 UTC
(In reply to Min Deng from comment #5)
> The bug also reproduced between rhel 9.2 and rhel 9.0
The bug also reproduced between rhel 9.3 and rhel 9.0
> RHEL 9.0 host
> kernel-5.14.0-303.el9.x86_64
> qemu-kvm-6.2.0-11.el9_0.7.x86_64
> RHEL 9.2
  RHEL 9.3
> kernel-5.14.0-302.el9.x86_64
> qemu-kvm-8.0.0-1.el9.x86_64

Should be from rhel9.3 to rhel 9.0. Thanks.

Comment 7 Leonardo Bras 2023-04-28 04:11:02 UTC
(In reply to Min Deng from comment #6)
> (In reply to Min Deng from comment #5)
> > The bug also reproduced between rhel 9.2 and rhel 9.0
> The bug also reproduced between rhel 9.3 and rhel 9.0
> > RHEL 9.0 host
> > kernel-5.14.0-303.el9.x86_64
> > qemu-kvm-6.2.0-11.el9_0.7.x86_64
> > RHEL 9.2
>   RHEL 9.3
> > kernel-5.14.0-302.el9.x86_64
> > qemu-kvm-8.0.0-1.el9.x86_64
> 
> Should be from rhel9.3 to rhel 9.0. Thanks.

Oh, that makes sense: something introduced in 9.2->9.3 have broken migration

Thanks for the testing!
I will start debugging this soon.

Comment 8 Leonardo Bras 2023-05-03 18:11:32 UTC
Upstream patch sent:

https://patchwork.kernel.org/project/qemu-devel/list/?series=744531

Comment 9 Min Deng 2023-05-09 06:04:31 UTC
Hi Leonardo
Could you please help to set DTM/ITM for this bug ? 
Thanks
Min

Comment 10 Min Deng 2023-05-29 02:34:22 UTC
Hi All,
The issue has been reproduced between rhel8.6 to rhel9.3 
RHEL 8.6 host
4.18.0-372.58.1.el8_6.x86_64
qemu-kvm-6.2.0-11.module+el8.6.0+18167+43cf40f3.8.x86_64
edk2-ovmf-20220126gitbb1bba3d77-2.el8_6.1.noarch
RHEL 9.3 host
5.14.0-316.el9.x86_64
qemu-kvm-8.0.0-4.el9.x86_64
edk2-ovmf-20230301gitf80f052277c8-4.el9.noarch

Test results
qemu-kvm: warning: Machine type 'pc-q35-rhel8.5/4/3/2/....0' is deprecated: machine types for previous major releases are deprecated
QEMU 8.0.0 monitor - type 'help' for more information
(qemu) migrate_incoming tcp:[::]:4000
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
qemu-kvm: load of migration failed: Invalid argument

Notes,
Except for rhel 8.6.0, the rest tests failed with other machine types. I have to say,the issue blocks almost all tests betwween RHEL8.x and RHEL9.x from QE side. Thanks

Comment 11 Leonardo Bras 2023-06-06 20:35:16 UTC
The MR for this bz:

https://gitlab.com/redhat/rhel/src/qemu-kvm/qemu-kvm/-/merge_requests/283

Comment 13 Yanan Fu 2023-06-15 03:28:15 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 16 Nitesh Narayan Lal 2023-06-15 09:40:47 UTC
(In reply to Min Deng from comment #10)
> Hi All,
> The issue has been reproduced between rhel8.6 to rhel9.3 
> RHEL 8.6 host
> 4.18.0-372.58.1.el8_6.x86_64
> qemu-kvm-6.2.0-11.module+el8.6.0+18167+43cf40f3.8.x86_64
> edk2-ovmf-20220126gitbb1bba3d77-2.el8_6.1.noarch
> RHEL 9.3 host
> 5.14.0-316.el9.x86_64
> qemu-kvm-8.0.0-4.el9.x86_64
> edk2-ovmf-20230301gitf80f052277c8-4.el9.noarch
> 
> Test results
> qemu-kvm: warning: Machine type 'pc-q35-rhel8.5/4/3/2/....0' is deprecated:
> machine types for previous major releases are deprecated
> QEMU 8.0.0 monitor - type 'help' for more information
> (qemu) migrate_incoming tcp:[::]:4000
> (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0
> device: 40 cmask: ff wmask: 0 w1cmask:19
> qemu-kvm: Failed to load PCIDevice:config
> qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> qemu-kvm: error while loading state for instance 0x0 of device
> '0000:00:12.0/pcie-root-port'
> qemu-kvm: load of migration failed: Invalid argument
> 
> Notes,
> Except for rhel 8.6.0, the rest tests failed with other machine types. I
> have to say,the issue blocks almost all tests betwween RHEL8.x and RHEL9.x
> from QE side. Thanks

Hi Min, Thanks for highlighting the importance.
Apologies for the delay; I recently came back from my PTOs.
It looks like Leonardo already took care of the fix (thanks), so we should be good.
However, if something is still missing, please let us know.

Comment 17 Min Deng 2023-06-18 14:15:54 UTC
QE tried the same steps to comment0 on the following builds
SRC:
RHEL 9.2
kernel-5.14.0-284.18.1.el9_2.x86_64
qemu-kvm-7.2.0-14.el9_2.1.x86_64
RHEL 9.3
kernel-5.14.0-325.el9.x86_64
qemu-kvm-8.0.0-5.el9.x86_64

The original issue has been fixed, thank you !

Comment 19 Min Deng 2023-06-19 02:23:51 UTC
New bug 
Bug 2215819 - Stable guest abi test failed while guest is with machine type lower than rhel 8.6.0 (not including 8.

Comment 23 Min Deng 2023-08-02 04:45:32 UTC
Per Leonardo, again, verified the bug on following build (qemu-kvm-8.0.0-10.el9.x86_64), the original issue has gone.
SRC:RHEL 9.3
kernel-5.14.0-348.el9.x86_64
qemu-kvm-8.0.0-10.el9.x86_64

DST:RHEL 9.2
5.14.0-284.26.1.el9_2.x86_64
qemu-kvm-7.2.0-14.el9_2.3.x86_64
Steps, please refer to Description

Actual results
Migration passed
Expected results
Migration pass

Thank you !

Comment 25 errata-xmlrpc 2023-11-07 08:27:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6368