Bug 2171860

Summary:	[libvirt] migration: larger->E3: vm failed with "failed to set MSR 0x202 to 0x380000000000"
Product:	Red Hat Enterprise Linux 9	Reporter:	yalzhang <yalzhang>
Component:	libvirt	Assignee:	Jiri Denemark <jdenemar>
libvirt sub component:	Live Migration	QA Contact:	Luyao Huang <lhuang>
Status:	CLOSED ERRATA	Docs Contact:	Jiri Herrmann <jherrman>
Severity:	high
Priority:	high	CC:	ailan, alex.williamson, berrange, chayang, fjin, gveitmic, jdenemar, jherrman, jinzhao, juzhang, kchamart, kraxel, lmen, mdeng, mprivozn, nanliu, nilal, vgoyal, virt-maint, xiaohli, xuwei, yafu, yanghliu, ymankad, zhguo
Version:	9.2	Keywords:	AutomationTriaged, Triaged
Target Milestone:	rc
Target Release:	9.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	libvirt-9.5.0-0rc1.1.el9	Doc Type:	Known Issue
Doc Text:	.Live migrating VMs to hosts with smaller physical address space sometimes fails Currently, attempting to migrate a running virtual machine (VM) in some cases fails with the following error: ---- failed to set MSR 0x202 to 0x380000000000 ---- This problem occurs most frequently when the destination VM host uses a CPU with a smaller physical address space than the source host, and has mainly been observed on clusters that contain Xeon E3 systems.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-11-07 08:30:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:	9.5.0
Embargoed:
Bug Depends On:	2176215
Bug Blocks:	2174749

Description yalzhang@redhat.com 2023-02-20 15:47:11 UTC

Description of problem:
vm migration failed with "failed to set MSR 0x202 to 0x380000000000"

Version-Release number of selected component (if applicable):
source and target host:
libvirt-9.0.0-6.el9.x86_64
qemu-kvm-7.2.0-9.el9.x86_64
kernel-5.14.0-268.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare source host with cpu model as "Cascadelake-Server-noTSX", target host with cpu model as "Skylake-Client-noTSX-IBRS";
mount nfs on both source and target host to target directory as /var/lib/libvirt/migrate/
On source host run:  
# virsh domcapabilities  > /var/lib/libvirt/migrate/cpu.xml
On target host run:
virsh domcapabilities  >> /var/lib/libvirt/migrate/cpu.xml

On the source host, generate the baseline cpu by:
# virsh hypervisor-cpu-baseline /var/lib/libvirt/migrate/cpu.xml --migratable
<cpu mode='custom' match='exact'>
  <model fallback='forbid'>Skylake-Client-IBRS</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='hypervisor'/>
  <feature policy='require' name='tsc_adjust'/>
  <feature policy='require' name='clflushopt'/>
  <feature policy='require' name='umip'/>
  <feature policy='require' name='md-clear'/>
  <feature policy='require' name='stibp'/>
  <feature policy='require' name='arch-capabilities'/>
  <feature policy='require' name='ssbd'/>
  <feature policy='require' name='xsaves'/>
  <feature policy='require' name='pdpe1gb'/>
  <feature policy='require' name='ibpb'/>
  <feature policy='require' name='ibrs'/>
  <feature policy='require' name='amd-stibp'/>
  <feature policy='require' name='amd-ssbd'/>
  <feature policy='require' name='skip-l1dfl-vmentry'/>
  <feature policy='require' name='pschange-mc-no'/>
  <feature policy='disable' name='hle'/>
  <feature policy='disable' name='rtm'/>
</cpu>

2. start a vm on source host with the cpu configuration above, and try to migrate the vm to target host:
# virsh migrate rhel --live --verbose qemu+ssh://{$target_host}/system --p2p --persistent --undefinesource
Migration: [100 %]error: operation failed: job 'migration out' unexpectedly failed

check the libvirtd log on target host:
2023-02-18 10:15:47.792+0000: 7216: error : qemuProcessReportLogError:1971 : internal error: qemu unexpectedly closed the monitor: 2023-02-18T10:15:47.735537Z qemu-kvm: warning: TSC frequency mismatch between VM (2194843 kHz) and host (2903990 kHz), and TSC scaling unavailable
2023-02-18T10:15:47.735651Z qemu-kvm: error: failed to set MSR 0x202 to 0x380000000000
qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *): Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.

Actual results:
VM migration failed with baseline cpu

Expected results:
VM migration should succeed

Additional info:

Comment 1 Dr. David Alan Gilbert 2023-02-20 17:18:40 UTC

MSR 0x202 is IA32_MTRR_PHYSBASE1

In your steps to reproduce you say:
   '1. prepare source host with cpu model as "Cascadelake-Server-noTSX", target host with cpu model as "Skylake-Client-noTSX-IBRS";'

I don't understand what you mean there;  what is your physical source and deestination host cpu?
Is this a nested setup?

Comment 2 yalzhang@redhat.com 2023-02-21 01:22:39 UTC

(In reply to Dr. David Alan Gilbert from comment #1)
> MSR 0x202 is IA32_MTRR_PHYSBASE1
> 
> In your steps to reproduce you say:
>    '1. prepare source host with cpu model as "Cascadelake-Server-noTSX",
> target host with cpu model as "Skylake-Client-noTSX-IBRS";'
> 
> I don't understand what you mean there;  what is your physical source and
> deestination host cpu?
> Is this a nested setup?

No, it's not a nested setup. 
Source host:
# lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  48
  On-line CPU(s) list:   0-47
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel
  Model name:            Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
    BIOS Model name:     Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  12
    Socket(s):           2
    Stepping:            7
    BogoMIPS:            4400.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf
                         lush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm c
                         onstant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc c
                         puid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 s
                         dbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadl
                         ine_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault e
                         pb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_en
                         hanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi
                         1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx sma
                         p clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetb
                         v1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat
                          pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   768 KiB (24 instances)
  L1i:                   768 KiB (24 instances)
  L2:                    24 MiB (24 instances)
  L3:                    33 MiB (2 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: Split huge pages
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT vulnerable
  Retbleed:              Mitigation; Enhanced IBRS
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW se
                         quence
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; TSX disabled

Target host:
# lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz
    BIOS Model name:     Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz
    CPU family:          6
    Model:               94
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            3
    CPU max MHz:         3900.0000
    CPU min MHz:         800.0000
    BogoMIPS:            5799.77
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf
                         lush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm c
                         onstant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc c
                         puid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 s
                         dbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_
                         timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb i
                         nvpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpi
                         d ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed ad
                         x smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat p
                         ln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabi
                         lities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    1 MiB (4 instances)
  L3:                    8 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-7
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT vulnerable
  Retbleed:              Mitigation; IBRS
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Mitigation; Microcode
  Tsx async abort:       Mitigation; TSX disabled

Guest cpu configuration:
-cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off

I have a the env to reproduce the issue, could you please help to have a look if it's needed & convenient? Or some suggestions? since I have no idea about how to debug it further.

Comment 3 Li Xiaohui 2023-02-21 05:53:17 UTC

Using yalan's hosts, do plain migration from Cascade Lake (Silver 4214) to Skylake (E3-1260L v5), dst qemu would crash when migration completed, same error as Description.

Also tried to migrate from Skylake to Cascade Lake, but migration succeeds, and qemu works well.



Now loan one Icelake and one Haswell machine, will try again to see if bug only happens on special machines or happens when migrating from new to old cpu machines.

Comment 4 Li Xiaohui 2023-02-21 06:02:21 UTC

Qemu cmds when reproduce bug through qemu:
/usr/libexec/qemu-kvm \
-name guest=rhel,debug-threads=on \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-rhel9.2.0,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-accel kvm \
-cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off \
-global driver=cfi.pflash01,property=secure,value=on \
-m 2048 \
-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648}' \
-overcommit mem-lock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid 2a779334-eb31-4771-b690-d9eb276967f3 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,path=/tmp/hello1,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \
-device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \
-device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \
-device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \
-device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \
-device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \
-device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' \
-device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' \
-device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' \
-device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' \
-device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' \
-device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' \
-device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.3","addr":"0x0"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/migrate/yalzhang/RHEL-9.2.0-20230216.15-x86_64-ovmf.qcow2.1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device '{"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1}' \
-netdev '{"type":"tap","vhost":true,"id":"hostnet0"}' \
-device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:f3:28:cb","bus":"pci.1","addr":"0x0"}' \
-chardev pty,id=charserial0 \
-device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \
-chardev socket,id=charchannel0,path=/tmp/hello2,server=on,wait=off \
-device '{"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"}' \
-device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-vnc 127.0.0.1:0,audiodev=audio1 \
-device '{"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"}' \
-device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' \
-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
-device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' \
-sandbox on \
-msg timestamp=on \
-monitor stdio \

Notes: still reproduce bug when delete cpu flags, only boot VM with '-cpu Skylake-Client-IBRS'

Comment 5 Dr. David Alan Gilbert 2023-02-21 09:49:39 UTC

(In reply to yalzhang from comment #2)
> (In reply to Dr. David Alan Gilbert from comment #1)
> > MSR 0x202 is IA32_MTRR_PHYSBASE1
> > 
> > In your steps to reproduce you say:
> >    '1. prepare source host with cpu model as "Cascadelake-Server-noTSX",
> > target host with cpu model as "Skylake-Client-noTSX-IBRS";'
> > 
> > I don't understand what you mean there;  what is your physical source and
> > deestination host cpu?
> > Is this a nested setup?
> 
> No, it's not a nested setup. 

Thankyou for the extra detail.

> Source host:
> # lscpu 
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         46 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  48
>   On-line CPU(s) list:   0-47
> Vendor ID:               GenuineIntel
>   BIOS Vendor ID:        Intel
>   Model name:            Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz

> Target host:
> # lscpu 
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         39 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  8
>   On-line CPU(s) list:   0-7
> Vendor ID:               GenuineIntel
>   BIOS Vendor ID:        Intel(R) Corporation
>   Model name:            Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz

> Guest cpu configuration:
> -cpu
> Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,
> clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,
> xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-
> vmentry=on,pschange-mc-no=on,hle=off,rtm=off
> 
> I have a the env to reproduce the issue, could you please help to have a
> look if it's needed & convenient? Or some suggestions? since I have no idea
> about how to debug it further.


I have a suspicion the problem is really to do with migrating towards E3 versions, which have
a smaller 'Address size'; the 0x380000000000 won't fit in the 39bits of an E3 but would
fit in the 46bits of it's bigger model.

 * please retest this using a destination host with a non-E3 version of Skylake?
 * What guest are you running?
 * Is this a regression?

Comment 6 yalzhang@redhat.com 2023-02-22 06:02:36 UTC

(In reply to Dr. David Alan Gilbert from comment #5)
> 
> I have a suspicion the problem is really to do with migrating towards E3
> versions, which have
> a smaller 'Address size'; the 0x380000000000 won't fit in the 39bits of an
> E3 but would
> fit in the 46bits of it's bigger model.
> 
>  * please retest this using a destination host with a non-E3 version of
> Skylake?
Migrate to a target host with cpu Bronze 3106 cpu(not E3, Skylake-Server-IBRS), succeed; 
Migrate to a target host with cpu Silver 4110(not E3, Skylake-Server-IBRS), succeed; Test with latest rhel 9.2 version.

>  * What guest are you running?
Guest xml and qemu cmd will be attached;

>  * Is this a regression?
Yes, it can *not* be reproduced on latest rhel 9.1 with the same hosts:
libvirt-8.5.0-7.4.el9_1.x86_64
qemu-kvm-7.0.0-13.el9_1.2.x86_64
And when I upgrade the hosts to rhel 9.2 by “# yum update -y” and reboot, the issue *can* be reproduced.

Comment 9 Li Xiaohui 2023-02-22 07:48:22 UTC

Thanks yalan.

I also tried to migrate the VM from src Icelake(Xeon(R) Silver 4310) to dst Haswell(Xeon(R) E5-2609 v3) machine, migration succeeded, qemu works well.

So seems this issue happened on the special hardware (like Dave's guess -> migrating towards E3 versions) according to all comments.

Comment 10 Dr. David Alan Gilbert 2023-02-22 09:32:50 UTC

OK thanks! So this *might* count as not-a-bug since the problem is caused by migrating between two different CPU types

Please try the following:

   a) Try a bios based VM rather than ovmf
       (I'm thinking maybe a change in OVMF changed the way it's using mtrr?)

   b) If (a) works, then please try ovmf again but using 9.1's OVMF package
      with the rest being 9.2

   c) Going back to the 9.2 ovmf; please try changing the -cpu section to
      -cpu Skylake-Client-IBRS,phys-bits=39,.......

Thnks,

Dave

Comment 11 Dr. David Alan Gilbert 2023-02-22 10:18:31 UTC

Note apparently you can set the phys-bits=39 in the libvirt CPU definition by:

<maxphysaddr mode='emulate' bits='39'/>

Comment 12 yalzhang@redhat.com 2023-02-22 11:12:08 UTC

(In reply to Dr. David Alan Gilbert from comment #10)
> OK thanks! So this *might* count as not-a-bug since the problem is caused by
> migrating between two different CPU types
> 
> Please try the following:
> 
>    a) Try a bios based VM rather than ovmf
>        (I'm thinking maybe a change in OVMF changed the way it's using mtrr?)
Migration succeed for bios based vm with same cpu settings.

>    b) If (a) works, then please try ovmf again but using 9.1's OVMF package
>       with the rest being 9.2
Current 9.2 ovmf: edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch
Remove it, and install 9.1's OVMF edk2-ovmf-20220526git16779ede2d36-3.el9.noarch, try migration, succeed.

> 
>    c) Going back to the 9.2 ovmf; please try changing the -cpu section to
>       -cpu Skylake-Client-IBRS,phys-bits=39,.......
update the 9.1 ovmf to 9.2 ovmf edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch, 
and update the guest xml to be with phys-bits=39, try migration, failed with the same error msg:

-cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off,mpx=off,phys-bits=39 \
......
2023-02-22T11:05:31.999883Z qemu-kvm: error: failed to set MSR 0x202 to 0x380000000000
qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *): Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
2023-02-22 11:05:32.263+0000: shutting down, reason=crashed

Comment 13 Dr. David Alan Gilbert 2023-02-22 12:22:30 UTC

(In reply to yalzhang from comment #12)
> (In reply to Dr. David Alan Gilbert from comment #10)
> > OK thanks! So this *might* count as not-a-bug since the problem is caused by
> > migrating between two different CPU types
> > 
> > Please try the following:
> > 
> >    a) Try a bios based VM rather than ovmf
> >        (I'm thinking maybe a change in OVMF changed the way it's using mtrr?)
> Migration succeed for bios based vm with same cpu settings.

Great.

> >    b) If (a) works, then please try ovmf again but using 9.1's OVMF package
> >       with the rest being 9.2
> Current 9.2 ovmf: edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch
> Remove it, and install 9.1's OVMF
> edk2-ovmf-20220526git16779ede2d36-3.el9.noarch, try migration, succeed.

OK, so (a) & (b) tell us that OVMF is doing something different in 9.2; I don't think it's
actually a bug, since the MTRR value it's written is correct for the source host it was started on.
Lets ask Gerd what changed.
Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs and physical address size?
 
> > 
> >    c) Going back to the 9.2 ovmf; please try changing the -cpu section to
> >       -cpu Skylake-Client-IBRS,phys-bits=39,.......
> update the 9.1 ovmf to 9.2 ovmf
> edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch, 
> and update the guest xml to be with phys-bits=39, try migration, failed with
> the same error msg:

Oh, interesting - I'd hoped that would have fixed it.

> 
> -cpu
> Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,
> clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,
> xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-
> vmentry=on,pschange-mc-no=on,hle=off,rtm=off,mpx=off,phys-bits=39 \
> ......
> 2023-02-22T11:05:31.999883Z qemu-kvm: error: failed to set MSR 0x202 to
> 0x380000000000
> qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *):
> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
> 2023-02-22 11:05:32.263+0000: shutting down, reason=crashed

Comment 15 Gerd Hoffmann 2023-02-23 06:18:06 UTC

> Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs
> and physical address size?

Yes.  OVMF starts using the full physical address space 😎
Check /proc/iomem to see the difference.

> > >    c) Going back to the 9.2 ovmf; please try changing the -cpu section to
> > >       -cpu Skylake-Client-IBRS,phys-bits=39,.......

I think due to host-phys-bits=on being the default downstream
you need host-phys-bits-limit=39 to get the desired effect.
And, yes, that should fix it.

Migrating with physical address spaces being different and source and target
machine was never a valid operation, and the host-phys-bits-limit=39 config
switch was added to address exactly that: Allow setting a cluster-wide limit
for live migration compatibility.

Comment 16 Guo, Zhiyi 2023-02-23 07:35:44 UTC

(In reply to Gerd Hoffmann from comment #15)
> > Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs
> > and physical address size?
> 
> Yes.  OVMF starts using the full physical address space 😎
> Check /proc/iomem to see the difference.

This also impacts a scenario that when trying to live migrate a VM from 5 level paging enabled Icelake server to 4 level paging host like Icelake server with 4 level paging disabled or Cascadelake server, etc.

5 level paging enabled Icelake server:
...
clflush size    : 64
cache_alignment : 64
address sizes   : 52 bits physical, 57 bits virtual
power management:
...

> 
> > > >    c) Going back to the 9.2 ovmf; please try changing the -cpu section to
> > > >       -cpu Skylake-Client-IBRS,phys-bits=39,.......
> 
> I think due to host-phys-bits=on being the default downstream
> you need host-phys-bits-limit=39 to get the desired effect.
> And, yes, that should fix it.
> 
> Migrating with physical address spaces being different and source and target
> machine was never a valid operation, 
Without the full physical address space enabled by OVMF, I think users might not be aware of this difference by performing simple live migration operation?

> and the host-phys-bits-limit=39 config
> switch was added to address exactly that: Allow setting a cluster-wide limit
> for live migration compatibility.
I think we introduced a big change that breaks the original behavior when performing VM live migration on hosts with different physical address spaces and I'm not sure how this is handled gracefully from layer product.

I'm CC Daniel and Germano to see if they have any comments about this

Comment 17 yalzhang@redhat.com 2023-02-23 07:56:59 UTC

Migrate from host with the baseline cpu of below hosts also fails:
Source: Intel(R) Xeon(R) Silver 4310 CPU (Broadwell-noTSX-IBRS) 
Target: Intel(R) Xeon(R) Silver 4210 CPU (Cascadelake-Server-noTSX)

With the maxphysaddr setting, it still fail:
# virsh dumpxml 750 --xpath //cpu
<cpu mode="custom" match="exact" check="full">
  <model fallback="forbid">Cooperlake</model>
  <vendor>Intel</vendor>
  <maxphysaddr mode="emulate" bits="39"/>
  <feature policy="require" name="ss"/>
  <feature policy="require" name="vmx"/>
  <feature policy="require" name="pdcm"/>
  <feature policy="require" name="hypervisor"/>
  <feature policy="require" name="tsc_adjust"/>
  <feature policy="require" name="umip"/>
  <feature policy="require" name="md-clear"/>
  <feature policy="require" name="xsaves"/>
  <feature policy="require" name="ibpb"/>
  <feature policy="require" name="ibrs"/>
  <feature policy="require" name="amd-stibp"/>
  <feature policy="require" name="amd-ssbd"/>
  <feature policy="require" name="tsx-ctrl"/>
  <feature policy="disable" name="hle"/>
  <feature policy="disable" name="rtm"/>
  <feature policy="disable" name="avx512-bf16"/>
  <feature policy="disable" name="taa-no"/>
</cpu>

# virsh migrate 750 --live --verbose qemu+ssh://${target_host}/system 
root@${target_host}'s password: 
Migration: [100 %]error: operation failed: job 'migration out' unexpectedly failed

The qemu log on target host shows:
2023-02-23T07:54:37.683367Z qemu-kvm: warning: Host physical bits (46) does not match phys-bits property (39)
2023-02-23T07:54:46.372227Z qemu-kvm: error: failed to set MSR 0x202 to 0x700000000000
qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *): Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
2023-02-23 07:54:46.653+0000: shutting down, reason=crashed

Comment 18 Dr. David Alan Gilbert 2023-02-23 10:45:28 UTC

(In reply to Guo, Zhiyi from comment #16)
> (In reply to Gerd Hoffmann from comment #15)
> > > Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs
> > > and physical address size?
> > 
> > Yes.  OVMF starts using the full physical address space 😎
> > Check /proc/iomem to see the difference.
> 
> This also impacts a scenario that when trying to live migrate a VM from 5
> level paging enabled Icelake server to 4 level paging host like Icelake
> server with 4 level paging disabled or Cascadelake server, etc.

I was always expecting 5 level->4 level to cause problems though.

> 5 level paging enabled Icelake server:
> ...
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 52 bits physical, 57 bits virtual
> power management:
> ...
> 
> > 
> > > > >    c) Going back to the 9.2 ovmf; please try changing the -cpu section to
> > > > >       -cpu Skylake-Client-IBRS,phys-bits=39,.......
> > 
> > I think due to host-phys-bits=on being the default downstream
> > you need host-phys-bits-limit=39 to get the desired effect.
> > And, yes, that should fix it.
> > 
> > Migrating with physical address spaces being different and source and target
> > machine was never a valid operation, 
> Without the full physical address space enabled by OVMF, I think users might
> not be aware of this difference by performing simple live migration
> operation?
> 
> > and the host-phys-bits-limit=39 config
> > switch was added to address exactly that: Allow setting a cluster-wide limit
> > for live migration compatibility.
> I think we introduced a big change that breaks the original behavior when
> performing VM live migration on hosts with different physical address spaces
> and I'm not sure how this is handled gracefully from layer product.
> 
> I'm CC Daniel and Germano to see if they have any comments about this

Comment 19 Dr. David Alan Gilbert 2023-02-23 11:06:48 UTC

Can you please try the host-phys-bits-limit=39 that Gerd suggests in comment 15?

Comment 20 Dr. David Alan Gilbert 2023-02-23 11:08:17 UTC

(In reply to Gerd Hoffmann from comment #15)
> > Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs
> > and physical address size?
> 
> Yes.  OVMF starts using the full physical address space 😎
> Check /proc/iomem to see the difference.
> 
> > > >    c) Going back to the 9.2 ovmf; please try changing the -cpu section to
> > > >       -cpu Skylake-Client-IBRS,phys-bits=39,.......
> 
> I think due to host-phys-bits=on being the default downstream
> you need host-phys-bits-limit=39 to get the desired effect.
> And, yes, that should fix it.
> 
> Migrating with physical address spaces being different and source and target
> machine was never a valid operation, and the host-phys-bits-limit=39 config
> switch was added to address exactly that: Allow setting a cluster-wide limit
> for live migration compatibility.

So what's the difference between setting:
  host-phys-bits=off,phys-bits=39
and
  host-phys-bits-limit=39

?

Comment 21 Li Xiaohui 2023-02-23 13:09:26 UTC

(In reply to Dr. David Alan Gilbert from comment #19)
> Can you please try the host-phys-bits-limit=39 that Gerd suggests in comment
> 15?

Migration succeeds from src machine Xeon Silver 4210 to dst machine (Xeon E3-1260L v5) with host-phys-bits-limit=39


Only get some clocksource warnning in guest:

[   60.691672] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[   60.692943] clocksource:                       'kvm-clock' wd_nsec: 495968893 wd_now: 1204a6d08f wd_last: 11e716ee12 mask: ffffffffffffffff
[   60.694554] clocksource:                       'tsc' cs_nsec: 656215205 cs_now: 27c04b4cd6 cs_last: 276a723d44 mask: ffffffffffffffff
[   60.696052] clocksource:                       'kvm-clock' (not 'tsc') is current clocksource.
[   60.697142] tsc: Marking TSC unstable due to clocksource watchdog

Comment 22 Li Xiaohui 2023-02-23 13:12:43 UTC

The test of Comment 21 are on RHEL 9.2 (kernel-5.14.0-268.el9.x86_64 && qemu-kvm-7.2.0-10.el9.x86_64 && edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch)

Comment 23 Dr. David Alan Gilbert 2023-02-23 14:23:26 UTC

OK, thanks for testing; that's great to know that fixes it.  I don't think there's a libvirt way to do that yet,
I've filed:
https://gitlab.com/libvirt/libvirt/-/issues/450#note_1289883756

to ask for it.

Comment 24 Gerd Hoffmann 2023-02-23 15:13:50 UTC

> So what's the difference between setting:
>   host-phys-bits=off,phys-bits=39
> and
>   host-phys-bits-limit=39

host-phys-bits=off,phys-bits=x allows to set any value you like, including values larger than what the host supports (and with tcg that actually works, with kvm it doesn't of course).

host-phys-bits=on,host-phys-bits-limit=x allows only values the host can actually handle and throws an error otherwise.

Comment 25 Gerd Hoffmann 2023-02-23 15:24:33 UTC

> host-phys-bits=on,host-phys-bits-limit=x allows only values the host can
> actually handle and throws an error otherwise.

Correction: Doesn't throw errors, but uses min(limit,supported).

Comment 26 Germano Veit Michel 2023-02-23 23:55:38 UTC

(In reply to Guo, Zhiyi from comment #16)
> I think we introduced a big change that breaks the original behavior when
> performing VM live migration on hosts with different physical address spaces
> and I'm not sure how this is handled gracefully from layer product.

I'd say we introduced a change that exposed something that was already pretty broken before :)

> I'm CC Daniel and Germano to see if they have any comments about this

IIUC, we need to do the following:
1. [Bug/RFE] That virsh hypervisor-cpu-baseline *should* tell the customer about the address space difference and generate a definition that allows migrating back and forth.
2. [Docs] On RHEL 9.2+ KVM, we need to document this for customers doing live migrations, explain that phys-bits need to be set when using hosts with different address space sizes. This may depend on item 1 above.
3. [Internal] Let layered products know that they need to generate specific XML depending on the hosts that are in the cluster (i.e. lowest common denominator).
4. [KCS] For RHEL KVM, document the error and explain whats going on and how to fix.

Comment 27 Guo, Zhiyi 2023-02-27 10:30:25 UTC

Hi Nitesh,

   Per comment 23 & 26, can you change the component of this bug to libvirt? Thanks!


Zhiyi

Comment 29 Li Xiaohui 2023-02-28 11:50:11 UTC

Hi dgilbert,

As talked above, host-phys-bits is on by default. 
And we shall use host-phys-bits-limit=x when migrating between hosts that have different address sizes, right?

Per my Comment 3, I'm not clear why migration succeeds when migrating from Skylake to Cascade Lake machine?

(In reply to Li Xiaohui from comment #3)
> Using yalan's hosts, do plain migration from Cascade Lake (Silver 4214) to
> Skylake (E3-1260L v5), dst qemu would crash when migration completed, same
> error as Description.
> 
> Also tried to migrate from Skylake to Cascade Lake, but migration succeeds,
> and qemu works well.
> 
> 
> 
> Now loan one Icelake and one Haswell machine, will try again to see if bug
> only happens on special machines or happens when migrating from new to old
> cpu machines.

Comment 30 Dr. David Alan Gilbert 2023-02-28 13:25:53 UTC

(In reply to Li Xiaohui from comment #29)
> Hi dgilbert,
> 
> As talked above, host-phys-bits is on by default. 
> And we shall use host-phys-bits-limit=x when migrating between hosts that
> have different address sizes, right?

Right, that should work (but there's no mechanism via libvirt yet)

> Per my Comment 3, I'm not clear why migration succeeds when migrating from
> Skylake to Cascade Lake machine?
> 
> (In reply to Li Xiaohui from comment #3)
> > Using yalan's hosts, do plain migration from Cascade Lake (Silver 4214) to
> > Skylake (E3-1260L v5), dst qemu would crash when migration completed, same
> > error as Description.
> > 
> > Also tried to migrate from Skylake to Cascade Lake, but migration succeeds,
> > and qemu works well.
> > 
> > 
> > 
> > Now loan one Icelake and one Haswell machine, will try again to see if bug
> > only happens on special machines or happens when migrating from new to old
> > cpu machines.

Comment 31 Nitesh Narayan Lal 2023-02-28 15:04:41 UTC

Moving this to libvirt.
We discussed this BZ in today's live migration bi-weekly sync, and the consensus was to fix it in 9.3 and defer the 9.2 z-stream for now. That is mainly because it is unlikely that a customer will migrate between a larger Xeon and E3 and hence they may not run into this. If we do get a bug on 9.2, as that's what CNV will consume, we can request a z-stream based on that.
@Germano, do you have any objections?

Comment 34 Gerd Hoffmann 2023-03-01 06:24:51 UTC

> Unfortunately yes. Many customers have heterogeneous clusters with different
> CPU models, we see it all the time and you may remember recent escalations
> came from clusters having different CPU models.

That sounds like it might be a good idea to back out the edk2 change until
libvirt is ready to handle it.  It's not much effort to do, effectively
a one-line change to force traditional (9.1-style) behavior.

If you want this please open two bugs against edk2:
 * one to turn it off, for 9.2-ga, and
 * one to turn it back on, for 9.3 (or 9.2.z),
   with a dependency on the libvirt update.

Comment 39 Ján Tomko 2023-03-01 14:55:21 UTC

I'm confused about the scope of this bug here.
I got it assigned because I started writing the patches to expose
the host-phys-bits-limit option in libvirt:

https://gitlab.com/libvirt/libvirt/-/issues/450
https://listman.redhat.com/archives/libvir-list/2023-March/238231.html

I'm not familiar enough with hypervisor-cpu-baseline to say whether libvirt
even has the information needed to calculate the right limit.

Comment 46 Ján Tomko 2023-03-07 16:46:21 UTC

I filed bug 2176215 for the host-phys-bits knob.

Comment 51 Gerd Hoffmann 2023-04-19 07:22:33 UTC

The two bugs depending on this, directly and indirectly, have high priority (bug 2174749, bug 2055123).
So raising priority for this bug too.

Comment 52 Gerd Hoffmann 2023-04-25 13:00:17 UTC

Ping.  No ITM set yet ...

What is the plan for this?
I'd like to see this being fixed early in the 9.3 devel cycle so
we have enough time handle the depennding bugs and to test this.

Comment 53 Gerd Hoffmann 2023-05-02 13:47:15 UTC

Ping, ping.  Still no ITM set yet ...

Comment 54 Jiri Denemark 2023-05-04 08:13:16 UTC

Well, exposing host-phys-bits-limit option through libvirt is already upstream
in libvirt-9.3.0 and will be included in the coming rebase of libvirt for
RHEL-9.3.0 (see bug 2176215).

The following XML

    <cpu ...>
        <maxphysaddr mode='passthrough' limit='39'/>
    </cpu>

translates to host-phys-bits=on,host-phys-bits-limit=39

And the following XML can be used to set host-phys-bits=off:

    <cpu ...>
        <maxphysaddr mode="emulate"/>
    </cpu>

So I believe that BZ should be enough for the two bugs from comment #51. Maybe
they should actually depend on bug 2176215 rather than this one? What do you
think Gerd?

So the question is what this BZ is supposed to be about.

The hypervisor-cpu-baseline command (or the corresponding API) provides a
single CPU model and features for a given set of host CPU models, but it
focuses on the common set of CPU features because the input data from
domcapabilities XML does not provide anything else. We could perhaps add the
host-phys-bits info in a some way there (I think it should be doable, but I
haven't checked for sure) and output the lowest value in the returned baseline
CPU definition, if this is what is requested by this BZ.

Or should we just document how to get the correct value from host capabilities
XML where it should already be reported?

Comment 55 Gerd Hoffmann 2023-05-04 09:49:24 UTC

> So the question is what this BZ is supposed to be about.
> 
> The hypervisor-cpu-baseline command (or the corresponding API) provides a
> single CPU model and features for a given set of host CPU models, but it
> focuses on the common set of CPU features because the input data from
> domcapabilities XML does not provide anything else. We could perhaps add the
> host-phys-bits info in a some way there (I think it should be doable, but I
> haven't checked for sure) and output the lowest value in the returned
> baseline
> CPU definition, if this is what is requested by this BZ.

Yes, I think that would be needed.  As far I know hypervisor-cpu-baseline
is what management tools (RHV, OpenShift, ...) are using to get a migratable
cpu configuration.  If we include phys-bits there this should fix the
migration problem without needing changes higher up in the management
stack.

Comment 56 Jiri Denemark 2023-05-04 12:04:25 UTC

I think RHV uses a predefined static set of CPU models which an administrator
can choose from to set a cluster level CPU model and Openstack uses host-model
by default. Unless something changed recently of course. So management tools
will need some changes too.

Comment 57 Germano Veit Michel 2023-05-04 21:30:17 UTC

(In reply to Jiri Denemark from comment #56)
> I think RHV uses a predefined static set of CPU models which an administrator
> can choose from to set a cluster level CPU model and Openstack uses
> host-model
> by default. Unless something changed recently of course. So management tools
> will need some changes too.

RHV is stuck on 8.6, so I think it is safe?

But we should notify CNV and OSP about this.

Comment 61 Jiri Denemark 2023-06-16 08:17:37 UTC

Patches sent upstream for review:

https://listman.redhat.com/archives/libvir-list/2023-June/240313.html

Comment 62 Jiri Denemark 2023-06-16 10:48:15 UTC

Pushed upstream as

commit be1b7d5b18e69a7000b93dad92d05105709afc43
Refs: v9.4.0-64-gbe1b7d5b18
Author:     Jiri Denemark <jdenemar>
AuthorDate: Fri Jun 9 17:17:36 2023 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Jun 16 12:44:54 2023 +0200

    qemu: Report physical address size in domain capabilities

    We already report the hosts physical address size in host capabilities,
    but computing a baseline CPU definition is done from domain
    capabilities.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

commit ce6d1dca6d9720e6dcb4e74a84550c2326a7c494
Refs: v9.4.0-65-gce6d1dca6d
Author:     Jiri Denemark <jdenemar>
AuthorDate: Fri Jun 9 18:12:53 2023 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Jun 16 12:44:54 2023 +0200

    qemu: Include maximum physical address size in baseline CPU

    The current implementation of virConnectBaselineHypervisorCPU in QEMU
    driver can provide a CPU definition that will not work on all hosts in
    case they have different maximum physical address size. So when we get
    the info from domain capabilities, we need to choose the smallest
    physical address size for the computed baseline CPU definition.

    https://bugzilla.redhat.com/show_bug.cgi?id=2171860

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

Comment 65 Luyao Huang 2023-08-09 09:35:29 UTC

Reproduce this bug without set <maxphysaddr mode='passthrough' limit='39'/> on libvirt-9.5.0-5.el9.x86_64:

1. prepare a running guest which didn't set maxphysaddr limit and physical address size > target host physical address size

GUEST:
# lscpu
...
  Address sizes:         46 bits physical, 48 bits virtual

SRC HOST:
# lscpu
...
  Address sizes:         46 bits physical, 57 bits virtual

TGT HOST:
# lscpu
...
  Address sizes:         39 bits physical, 48 bits virtual


2. migrate guest to target host
# virsh migrate vm1 qemu+ssh://tgthostname/system --live --p2p
error: operation failed: job 'migration out' unexpectedly failed

3. check guest log in target host:

/var/log/libvirt/qemu/vm1.log:
2023-08-09T07:10:32.081065Z qemu-kvm: error: failed to set MSR 0x202 to 0x380000000000
qemu-kvm: ../target/i386/kvm/kvm.c:3292: int kvm_buf_set_msrs(X86CPU *): Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.



Test this bug with set <maxphysaddr mode='passthrough' limit='39'/> on libvirt-9.5.0-5.el9.x86_64:

S1: Test domcapabilities cpu mode element include correct maxphysaddr value
1. 
# virsh domcapabilities 
...
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>SapphireRapids</model>
      <vendor>Intel</vendor>
      <maxphysaddr mode='passthrough' limit='46'/>
...

2.
# lscpu
  Address sizes:         46 bits physical, 57 bits virtual


S2: Test hypervisor-cpu-baseline can generate correct maxphysaddr element
1. 2 hosts' domcaps file include 2 different physical address size

# cat domcaps.xml |grep maxphysaddr
      <maxphysaddr mode='passthrough' limit='46'/>
      <maxphysaddr mode='passthrough' limit='39'/>
 
# virsh hypervisor-cpu-baseline domcaps.xml |grep maxphysaddr
  <maxphysaddr mode='passthrough' limit='39'/>

2. 3 hosts' domcaps file include 3 different physical address size
# cat domcaps3.xml |grep maxphysaddr
      <maxphysaddr mode='passthrough' limit='46'/>
      <maxphysaddr mode='passthrough' limit='39'/>
      <maxphysaddr mode='passthrough' limit='43'/>

# virsh hypervisor-cpu-baseline domcaps3.xml |grep maxphysaddr
  <maxphysaddr mode='passthrough' limit='39'/>

3. 2 hosts' domcaps file but only have 1 physical address size

# cat domcaps.xml |grep maxphysaddr
      <maxphysaddr mode='passthrough' limit='46'/>

# virsh hypervisor-cpu-baseline domcaps.xml |grep maxphysaddr
  <maxphysaddr mode='passthrough' limit='46'/>

4. 2 hosts' domcaps file but no maxphysaddr element
# cat domcaps.xml |grep maxphysaddr

# virsh hypervisor-cpu-baseline domcaps.xml |grep maxphysaddr


S3: Test migration between 2 host which have different physical address size
1. collect domcapabilities output from both source and target host:

# vim domcaps.xml
# virsh domcapabilities >> domcaps.xml
# cat domcaps.xml |grep maxphysaddr
      <maxphysaddr mode='passthrough' limit='46'/>
      <maxphysaddr mode='passthrough' limit='39'/>

2. use hypervisor-cpu-baseline get cpu element xml for migration:

# virsh hypervisor-cpu-baseline domcaps.xml 
<cpu mode='custom' match='exact'>
  <model fallback='forbid'>Skylake-Client-IBRS</model>
  <vendor>Intel</vendor>
  <maxphysaddr mode='passthrough' limit='39'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='hypervisor'/>
  <feature policy='require' name='tsc_adjust'/>
  <feature policy='require' name='clflushopt'/>
  <feature policy='require' name='umip'/>
  <feature policy='require' name='md-clear'/>
  <feature policy='require' name='stibp'/>
  <feature policy='require' name='flush-l1d'/>
  <feature policy='require' name='arch-capabilities'/>
  <feature policy='require' name='ssbd'/>
  <feature policy='require' name='xsaves'/>
  <feature policy='require' name='pdpe1gb'/>
  <feature policy='require' name='invtsc'/>
  <feature policy='require' name='ibpb'/>
  <feature policy='require' name='ibrs'/>
  <feature policy='require' name='amd-stibp'/>
  <feature policy='require' name='amd-ssbd'/>
  <feature policy='require' name='skip-l1dfl-vmentry'/>
  <feature policy='require' name='pschange-mc-no'/>
  <feature policy='disable' name='hle'/>
  <feature policy='disable' name='rtm'/>
  <feature policy='disable' name='mpx'/>
</cpu>

3. modify guest inactive xml and update cpu element:
# virsh edit vm1
...
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='forbid'>Skylake-Client-IBRS</model>
    <vendor>Intel</vendor>
    <maxphysaddr mode='passthrough' limit='39'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='clflushopt'/>
    <feature policy='require' name='umip'/>
    <feature policy='require' name='md-clear'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='flush-l1d'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='xsaves'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='ibpb'/>
    <feature policy='require' name='ibrs'/>
    <feature policy='require' name='amd-stibp'/>
    <feature policy='require' name='amd-ssbd'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='require' name='pschange-mc-no'/>
    <feature policy='disable' name='hle'/>
    <feature policy='disable' name='rtm'/>
    <feature policy='disable' name='mpx'/>
  </cpu>
...

4. start guest
# virsh start vm1
Domain 'vm1' started

5. login guest and check physical address size
IN GUEST:

# lscpu 
...
  Address sizes:         39 bits physical, 48 bits virtual

6. migrate guest to target host:

# virsh migrate vm1 qemu+ssh://tgthostname/system --live --p2p --verbose
Migration: [100.00 %]

7. login guest and check guest os and fs work as expected

8. migrate back to source host
# virsh migrate vm1 qemu+ssh://srchostname/system --live --p2p --verbose
Migration: [100.00 %]

9. login guest and check guest os and fs work as expected

Comment 68 errata-xmlrpc 2023-11-07 08:30:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6409

Comment 69 Red Hat Bugzilla 2024-03-07 04:25:23 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days