Bug 2223691

Summary: [machine type 9.2]Failed to migrate VM from RHEL 9.3 to RHEL 9.2
Product: Red Hat Enterprise Linux 9 Reporter: Min Deng <mdeng>
Component: qemu-kvmAssignee: Leonardo Bras <leobras>
qemu-kvm sub component: Machine Types QA Contact: Min Deng <mdeng>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: alexander.lougovski, coli, fjin, imammedo, jferlan, jinzhao, juzhang, leobras, lijin, mrezanin, mst, nilal, peterx, virt-maint, ymankad
Version: 9.3Keywords: CustomerScenariosInitiative, TestBlocker, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-8.0.0-10.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:28:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Min Deng 2023-07-18 14:08:19 UTC
Description of problem:
[machine type 9.2]Failed to migrate VM from RHEL 9.3 to RHEL 9.2 

Version-Release number of selected component (if applicable):
SRC:
RHEL 9.3
qemu-kvm-8.0.0-7.el9.x86_64
kernel-5.14.0-338.el9.x86_64
seabios-bin-1.16.1-1.el9.noarch

DST:
RHEL 9.2
kernel-5.14.0-284.18.1.el9_2.x86_64
qemu-kvm-7.2.0-14.el9_2.3.x86_64
seabios-bin-1.16.1-1.el9.noarch

How reproducible:
5/5
Steps to Reproduce:
1.In src :
# /usr/libexec/qemu-kvm  -name guest=rhel9 -machine pc-q35-rhel9.2.0
-monitor stdio -vnc :1 -cpu Broadwell-noTSX-IBRS,enforce
QEMU 8.0.0 monitor - type 'help' for more information
(qemu) 
2.In dst :
# /usr/libexec/qemu-kvm  -name guest=rhel9 -machine pc-q35-rhel9.2.0
-monitor stdio -vnc :1 -cpu Broadwell-noTSX-IBRS,enforce -incoming defer 
QEMU 7.2.0 monitor - type 'help' for more information
(qemu) migrate_incoming tcp::4444
(qemu) 
3.Migrate VM from RHEL9.3.0 to RHEL9.2.0
(qemu) migrate -d tcp:10.73.212.86:4444
Actual results:
 (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x10a read: 40
 device: 0 cmask: ff wmask: 0 w1cmask:0
 qemu-kvm: Failed to load PCIDevice:config
 qemu-kvm: Failed to load e1000e:parent_obj
 qemu-kvm: error while loading state for instance 0x0 of device
 '0000:00:02.0/e1000e'
 qemu-kvm: load of migration failed: Invalid argument
Expected results:
Migrate vm successfully.
Additional info:

Comment 1 Peter Xu 2023-07-18 14:35:58 UTC
Seems to be another PCI breakage besides bz2215819.  Needinfo Michael / Igor for this.

Comment 2 Michael S. Tsirkin 2023-07-18 20:16:01 UTC
maybe yes, maybe no. could you test with the fix for bz2215819 please?

Comment 3 Peter Xu 2023-07-18 21:26:24 UTC
(In reply to Michael S. Tsirkin from comment #2)
> maybe yes, maybe no. could you test with the fix for bz2215819 please?

Michael, please see https://bugzilla.redhat.com/show_bug.cgi?id=2215819#c35 - note that the config index is different.

Min, would you please double check with what Michael said (by checking whether Leo's fix there can fix this problem)?  Thanks!

Comment 4 Michael S. Tsirkin 2023-07-18 21:31:14 UTC
yes i saw that, it's a different device so could still be same capability at a different index.

Comment 6 Min Deng 2023-07-21 03:21:16 UTC
It blocks test from rhel 9.3 and rhel 9.2 and it should be a blocker from QE's perspective.
Also cc fjin
Thank you.

Comment 12 Leonardo Bras 2023-07-25 19:00:29 UTC
I found the bug:

patch 5ed3dabe57d was applied to 9.3, and hw_compat_7_2 got { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" },
but I did not add this line to hw_compat_rhel_9_2, so rhel machine type 9.2 still set the undesired bit

I will send a one-liner downstream to fix this

Comment 17 Yanan Fu 2023-08-02 02:26:41 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 18 Min Deng 2023-08-02 04:33:46 UTC
Verified the bug on following builds
SRC:RHEL 9.3
kernel-5.14.0-348.el9.x86_64
qemu-kvm-8.0.0-10.el9.x86_64
/usr/libexec/qemu-kvm  -name guest=rhel9 -machine pc-q35-rhel9.2.0 -monitor stdio -vnc :1 -cpu Broadwell-noTSX-IBRS,enforce
QEMU 8.0.0 monitor - type 'help' for more information
(qemu) migrate -d tcp:10.73.212.86:4444
DST:RHEL 9.2
5.14.0-284.26.1.el9_2.x86_64
qemu-kvm-7.2.0-14.el9_2.3.x86_64
/usr/libexec/qemu-kvm  -name guest=rhel9 -machine pc-q35-rhel9.2.0 -monitor stdio -vnc :1 -cpu Broadwell-noTSX-IBRS,enforce -incoming defer
QEMU 7.2.0 monitor - type 'help' for more information
(qemu) migrate_incoming tcp::4444
Actual results
Migration passed
Expected results
Migration pass

Comment 23 Min Deng 2023-08-10 05:10:58 UTC
Base on comment 18 and comment 19, move this bug to be verified. 
Thank YOU !

Comment 26 errata-xmlrpc 2023-11-07 08:28:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6368