Bug 2171860
Summary: | [libvirt] migration: larger->E3: vm failed with "failed to set MSR 0x202 to 0x380000000000" | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | yalzhang <yalzhang> |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
libvirt sub component: | Live Migration | QA Contact: | Luyao Huang <lhuang> |
Status: | CLOSED ERRATA | Docs Contact: | Jiri Herrmann <jherrman> |
Severity: | high | ||
Priority: | high | CC: | ailan, alex.williamson, berrange, chayang, fjin, gveitmic, jdenemar, jherrman, jinzhao, juzhang, kchamart, kraxel, lmen, mdeng, mprivozn, nanliu, nilal, vgoyal, virt-maint, xiaohli, xuwei, yafu, yanghliu, ymankad, zhguo |
Version: | 9.2 | Keywords: | AutomationTriaged, Triaged |
Target Milestone: | rc | ||
Target Release: | 9.3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-9.5.0-0rc1.1.el9 | Doc Type: | Known Issue |
Doc Text: |
.Live migrating VMs to hosts with smaller physical address space sometimes fails
Currently, attempting to migrate a running virtual machine (VM) in some cases fails with the following error:
----
failed to set MSR 0x202 to 0x380000000000
----
This problem occurs most frequently when the destination VM host uses a CPU with a smaller physical address space than the source host, and has mainly been observed on clusters that contain Xeon E3 systems.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2023-11-07 08:30:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | 9.5.0 |
Embargoed: | |||
Bug Depends On: | 2176215 | ||
Bug Blocks: | 2174749 |
Description
yalzhang@redhat.com
2023-02-20 15:47:11 UTC
MSR 0x202 is IA32_MTRR_PHYSBASE1 In your steps to reproduce you say: '1. prepare source host with cpu model as "Cascadelake-Server-noTSX", target host with cpu model as "Skylake-Client-noTSX-IBRS";' I don't understand what you mean there; what is your physical source and deestination host cpu? Is this a nested setup? (In reply to Dr. David Alan Gilbert from comment #1) > MSR 0x202 is IA32_MTRR_PHYSBASE1 > > In your steps to reproduce you say: > '1. prepare source host with cpu model as "Cascadelake-Server-noTSX", > target host with cpu model as "Skylake-Client-noTSX-IBRS";' > > I don't understand what you mean there; what is your physical source and > deestination host cpu? > Is this a nested setup? No, it's not a nested setup. Source host: # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: GenuineIntel BIOS Vendor ID: Intel Model name: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz BIOS Model name: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 Stepping: 7 BogoMIPS: 4400.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf lush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm c onstant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc c puid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 s dbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadl ine_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault e pb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_en hanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi 1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx sma p clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetb v1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 768 KiB (24 instances) L1i: 768 KiB (24 instances) L2: 24 MiB (24 instances) L3: 33 MiB (2 instances) NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47 Vulnerabilities: Itlb multihit: KVM: Mitigation: Split huge pages L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Retbleed: Mitigation; Enhanced IBRS Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW se quence Srbds: Not affected Tsx async abort: Mitigation; TSX disabled Target host: # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation Model name: Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz BIOS Model name: Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz CPU family: 6 Model: 94 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 3 CPU max MHz: 3900.0000 CPU min MHz: 800.0000 BogoMIPS: 5799.77 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf lush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm c onstant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc c puid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 s dbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_ timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb i nvpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpi d ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed ad x smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat p ln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabi lities Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 128 KiB (4 instances) L2: 1 MiB (4 instances) L3: 8 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerabilities: Itlb multihit: KVM: Mitigation: VMX disabled L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable Mds: Mitigation; Clear CPU buffers; SMT vulnerable Meltdown: Mitigation; PTI Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Retbleed: Mitigation; IBRS Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected Srbds: Mitigation; Microcode Tsx async abort: Mitigation; TSX disabled Guest cpu configuration: -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off I have a the env to reproduce the issue, could you please help to have a look if it's needed & convenient? Or some suggestions? since I have no idea about how to debug it further. Using yalan's hosts, do plain migration from Cascade Lake (Silver 4214) to Skylake (E3-1260L v5), dst qemu would crash when migration completed, same error as Description. Also tried to migrate from Skylake to Cascade Lake, but migration succeeds, and qemu works well. Now loan one Icelake and one Haswell machine, will try again to see if bug only happens on special machines or happens when migrating from new to old cpu machines. Qemu cmds when reproduce bug through qemu: /usr/libexec/qemu-kvm \ -name guest=rhel,debug-threads=on \ -blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \ -machine pc-q35-rhel9.2.0,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \ -accel kvm \ -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off \ -global driver=cfi.pflash01,property=secure,value=on \ -m 2048 \ -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648}' \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid 2a779334-eb31-4771-b690-d9eb276967f3 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,path=/tmp/hello1,server=on,wait=off \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \ -device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \ -device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \ -device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \ -device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \ -device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \ -device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \ -device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \ -device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \ -device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' \ -device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' \ -device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' \ -device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' \ -device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' \ -device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' \ -device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.3","addr":"0x0"}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/migrate/yalzhang/RHEL-9.2.0-20230216.15-x86_64-ovmf.qcow2.1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device '{"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1}' \ -netdev '{"type":"tap","vhost":true,"id":"hostnet0"}' \ -device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:f3:28:cb","bus":"pci.1","addr":"0x0"}' \ -chardev pty,id=charserial0 \ -device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \ -chardev socket,id=charchannel0,path=/tmp/hello2,server=on,wait=off \ -device '{"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"}' \ -device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \ -audiodev '{"id":"audio1","driver":"none"}' \ -vnc 127.0.0.1:0,audiodev=audio1 \ -device '{"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"}' \ -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' \ -object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ -device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' \ -sandbox on \ -msg timestamp=on \ -monitor stdio \ Notes: still reproduce bug when delete cpu flags, only boot VM with '-cpu Skylake-Client-IBRS' (In reply to yalzhang from comment #2) > (In reply to Dr. David Alan Gilbert from comment #1) > > MSR 0x202 is IA32_MTRR_PHYSBASE1 > > > > In your steps to reproduce you say: > > '1. prepare source host with cpu model as "Cascadelake-Server-noTSX", > > target host with cpu model as "Skylake-Client-noTSX-IBRS";' > > > > I don't understand what you mean there; what is your physical source and > > deestination host cpu? > > Is this a nested setup? > > No, it's not a nested setup. Thankyou for the extra detail. > Source host: > # lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 46 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 48 > On-line CPU(s) list: 0-47 > Vendor ID: GenuineIntel > BIOS Vendor ID: Intel > Model name: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz > Target host: > # lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 39 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 8 > On-line CPU(s) list: 0-7 > Vendor ID: GenuineIntel > BIOS Vendor ID: Intel(R) Corporation > Model name: Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz > Guest cpu configuration: > -cpu > Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on, > clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on, > xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl- > vmentry=on,pschange-mc-no=on,hle=off,rtm=off > > I have a the env to reproduce the issue, could you please help to have a > look if it's needed & convenient? Or some suggestions? since I have no idea > about how to debug it further. I have a suspicion the problem is really to do with migrating towards E3 versions, which have a smaller 'Address size'; the 0x380000000000 won't fit in the 39bits of an E3 but would fit in the 46bits of it's bigger model. * please retest this using a destination host with a non-E3 version of Skylake? * What guest are you running? * Is this a regression? (In reply to Dr. David Alan Gilbert from comment #5) > > I have a suspicion the problem is really to do with migrating towards E3 > versions, which have > a smaller 'Address size'; the 0x380000000000 won't fit in the 39bits of an > E3 but would > fit in the 46bits of it's bigger model. > > * please retest this using a destination host with a non-E3 version of > Skylake? Migrate to a target host with cpu Bronze 3106 cpu(not E3, Skylake-Server-IBRS), succeed; Migrate to a target host with cpu Silver 4110(not E3, Skylake-Server-IBRS), succeed; Test with latest rhel 9.2 version. > * What guest are you running? Guest xml and qemu cmd will be attached; > * Is this a regression? Yes, it can *not* be reproduced on latest rhel 9.1 with the same hosts: libvirt-8.5.0-7.4.el9_1.x86_64 qemu-kvm-7.0.0-13.el9_1.2.x86_64 And when I upgrade the hosts to rhel 9.2 by โ# yum update -yโ and reboot, the issue *can* be reproduced. Thanks yalan. I also tried to migrate the VM from src Icelake(Xeon(R) Silver 4310) to dst Haswell(Xeon(R) E5-2609 v3) machine, migration succeeded, qemu works well. So seems this issue happened on the special hardware (like Dave's guess -> migrating towards E3 versions) according to all comments. OK thanks! So this *might* count as not-a-bug since the problem is caused by migrating between two different CPU types Please try the following: a) Try a bios based VM rather than ovmf (I'm thinking maybe a change in OVMF changed the way it's using mtrr?) b) If (a) works, then please try ovmf again but using 9.1's OVMF package with the rest being 9.2 c) Going back to the 9.2 ovmf; please try changing the -cpu section to -cpu Skylake-Client-IBRS,phys-bits=39,....... Thnks, Dave Note apparently you can set the phys-bits=39 in the libvirt CPU definition by: <maxphysaddr mode='emulate' bits='39'/> (In reply to Dr. David Alan Gilbert from comment #10) > OK thanks! So this *might* count as not-a-bug since the problem is caused by > migrating between two different CPU types > > Please try the following: > > a) Try a bios based VM rather than ovmf > (I'm thinking maybe a change in OVMF changed the way it's using mtrr?) Migration succeed for bios based vm with same cpu settings. > b) If (a) works, then please try ovmf again but using 9.1's OVMF package > with the rest being 9.2 Current 9.2 ovmf: edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch Remove it, and install 9.1's OVMF edk2-ovmf-20220526git16779ede2d36-3.el9.noarch, try migration, succeed. > > c) Going back to the 9.2 ovmf; please try changing the -cpu section to > -cpu Skylake-Client-IBRS,phys-bits=39,....... update the 9.1 ovmf to 9.2 ovmf edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch, and update the guest xml to be with phys-bits=39, try migration, failed with the same error msg: -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off,mpx=off,phys-bits=39 \ ...... 2023-02-22T11:05:31.999883Z qemu-kvm: error: failed to set MSR 0x202 to 0x380000000000 qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *): Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. 2023-02-22 11:05:32.263+0000: shutting down, reason=crashed (In reply to yalzhang from comment #12) > (In reply to Dr. David Alan Gilbert from comment #10) > > OK thanks! So this *might* count as not-a-bug since the problem is caused by > > migrating between two different CPU types > > > > Please try the following: > > > > a) Try a bios based VM rather than ovmf > > (I'm thinking maybe a change in OVMF changed the way it's using mtrr?) > Migration succeed for bios based vm with same cpu settings. Great. > > b) If (a) works, then please try ovmf again but using 9.1's OVMF package > > with the rest being 9.2 > Current 9.2 ovmf: edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch > Remove it, and install 9.1's OVMF > edk2-ovmf-20220526git16779ede2d36-3.el9.noarch, try migration, succeed. OK, so (a) & (b) tell us that OVMF is doing something different in 9.2; I don't think it's actually a bug, since the MTRR value it's written is correct for the source host it was started on. Lets ask Gerd what changed. Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs and physical address size? > > > > c) Going back to the 9.2 ovmf; please try changing the -cpu section to > > -cpu Skylake-Client-IBRS,phys-bits=39,....... > update the 9.1 ovmf to 9.2 ovmf > edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch, > and update the guest xml to be with phys-bits=39, try migration, failed with > the same error msg: Oh, interesting - I'd hoped that would have fixed it. > > -cpu > Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on, > clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on, > xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl- > vmentry=on,pschange-mc-no=on,hle=off,rtm=off,mpx=off,phys-bits=39 \ > ...... > 2023-02-22T11:05:31.999883Z qemu-kvm: error: failed to set MSR 0x202 to > 0x380000000000 > qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *): > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. > 2023-02-22 11:05:32.263+0000: shutting down, reason=crashed > Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs > and physical address size? Yes. OVMF starts using the full physical address space ๐ Check /proc/iomem to see the difference. > > > c) Going back to the 9.2 ovmf; please try changing the -cpu section to > > > -cpu Skylake-Client-IBRS,phys-bits=39,....... I think due to host-phys-bits=on being the default downstream you need host-phys-bits-limit=39 to get the desired effect. And, yes, that should fix it. Migrating with physical address spaces being different and source and target machine was never a valid operation, and the host-phys-bits-limit=39 config switch was added to address exactly that: Allow setting a cluster-wide limit for live migration compatibility. (In reply to Gerd Hoffmann from comment #15) > > Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs > > and physical address size? > > Yes. OVMF starts using the full physical address space ๐ > Check /proc/iomem to see the difference. This also impacts a scenario that when trying to live migrate a VM from 5 level paging enabled Icelake server to 4 level paging host like Icelake server with 4 level paging disabled or Cascadelake server, etc. 5 level paging enabled Icelake server: ... clflush size : 64 cache_alignment : 64 address sizes : 52 bits physical, 57 bits virtual power management: ... > > > > > c) Going back to the 9.2 ovmf; please try changing the -cpu section to > > > > -cpu Skylake-Client-IBRS,phys-bits=39,....... > > I think due to host-phys-bits=on being the default downstream > you need host-phys-bits-limit=39 to get the desired effect. > And, yes, that should fix it. > > Migrating with physical address spaces being different and source and target > machine was never a valid operation, Without the full physical address space enabled by OVMF, I think users might not be aware of this difference by performing simple live migration operation? > and the host-phys-bits-limit=39 config > switch was added to address exactly that: Allow setting a cluster-wide limit > for live migration compatibility. I think we introduced a big change that breaks the original behavior when performing VM live migration on hosts with different physical address spaces and I'm not sure how this is handled gracefully from layer product. I'm CC Daniel and Germano to see if they have any comments about this Migrate from host with the baseline cpu of below hosts also fails: Source: Intel(R) Xeon(R) Silver 4310 CPU (Broadwell-noTSX-IBRS) Target: Intel(R) Xeon(R) Silver 4210 CPU (Cascadelake-Server-noTSX) With the maxphysaddr setting, it still fail: # virsh dumpxml 750 --xpath //cpu <cpu mode="custom" match="exact" check="full"> <model fallback="forbid">Cooperlake</model> <vendor>Intel</vendor> <maxphysaddr mode="emulate" bits="39"/> <feature policy="require" name="ss"/> <feature policy="require" name="vmx"/> <feature policy="require" name="pdcm"/> <feature policy="require" name="hypervisor"/> <feature policy="require" name="tsc_adjust"/> <feature policy="require" name="umip"/> <feature policy="require" name="md-clear"/> <feature policy="require" name="xsaves"/> <feature policy="require" name="ibpb"/> <feature policy="require" name="ibrs"/> <feature policy="require" name="amd-stibp"/> <feature policy="require" name="amd-ssbd"/> <feature policy="require" name="tsx-ctrl"/> <feature policy="disable" name="hle"/> <feature policy="disable" name="rtm"/> <feature policy="disable" name="avx512-bf16"/> <feature policy="disable" name="taa-no"/> </cpu> # virsh migrate 750 --live --verbose qemu+ssh://${target_host}/system root@${target_host}'s password: Migration: [100 %]error: operation failed: job 'migration out' unexpectedly failed The qemu log on target host shows: 2023-02-23T07:54:37.683367Z qemu-kvm: warning: Host physical bits (46) does not match phys-bits property (39) 2023-02-23T07:54:46.372227Z qemu-kvm: error: failed to set MSR 0x202 to 0x700000000000 qemu-kvm: ../target/i386/kvm/kvm.c:3177: int kvm_buf_set_msrs(X86CPU *): Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. 2023-02-23 07:54:46.653+0000: shutting down, reason=crashed (In reply to Guo, Zhiyi from comment #16) > (In reply to Gerd Hoffmann from comment #15) > > > Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs > > > and physical address size? > > > > Yes. OVMF starts using the full physical address space ๐ > > Check /proc/iomem to see the difference. > > This also impacts a scenario that when trying to live migrate a VM from 5 > level paging enabled Icelake server to 4 level paging host like Icelake > server with 4 level paging disabled or Cascadelake server, etc. I was always expecting 5 level->4 level to cause problems though. > 5 level paging enabled Icelake server: > ... > clflush size : 64 > cache_alignment : 64 > address sizes : 52 bits physical, 57 bits virtual > power management: > ... > > > > > > > > c) Going back to the 9.2 ovmf; please try changing the -cpu section to > > > > > -cpu Skylake-Client-IBRS,phys-bits=39,....... > > > > I think due to host-phys-bits=on being the default downstream > > you need host-phys-bits-limit=39 to get the desired effect. > > And, yes, that should fix it. > > > > Migrating with physical address spaces being different and source and target > > machine was never a valid operation, > Without the full physical address space enabled by OVMF, I think users might > not be aware of this difference by performing simple live migration > operation? > > > and the host-phys-bits-limit=39 config > > switch was added to address exactly that: Allow setting a cluster-wide limit > > for live migration compatibility. > I think we introduced a big change that breaks the original behavior when > performing VM live migration on hosts with different physical address spaces > and I'm not sure how this is handled gracefully from layer product. > > I'm CC Daniel and Germano to see if they have any comments about this Can you please try the host-phys-bits-limit=39 that Gerd suggests in comment 15? (In reply to Gerd Hoffmann from comment #15) > > Gerd: Do you know anything that changed in 9.2's OVMF with respect to MTRRs > > and physical address size? > > Yes. OVMF starts using the full physical address space ๐ > Check /proc/iomem to see the difference. > > > > > c) Going back to the 9.2 ovmf; please try changing the -cpu section to > > > > -cpu Skylake-Client-IBRS,phys-bits=39,....... > > I think due to host-phys-bits=on being the default downstream > you need host-phys-bits-limit=39 to get the desired effect. > And, yes, that should fix it. > > Migrating with physical address spaces being different and source and target > machine was never a valid operation, and the host-phys-bits-limit=39 config > switch was added to address exactly that: Allow setting a cluster-wide limit > for live migration compatibility. So what's the difference between setting: host-phys-bits=off,phys-bits=39 and host-phys-bits-limit=39 ? (In reply to Dr. David Alan Gilbert from comment #19) > Can you please try the host-phys-bits-limit=39 that Gerd suggests in comment > 15? Migration succeeds from src machine Xeon Silver 4210 to dst machine (Xeon E3-1260L v5) with host-phys-bits-limit=39 Only get some clocksource warnning in guest: [ 60.691672] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large: [ 60.692943] clocksource: 'kvm-clock' wd_nsec: 495968893 wd_now: 1204a6d08f wd_last: 11e716ee12 mask: ffffffffffffffff [ 60.694554] clocksource: 'tsc' cs_nsec: 656215205 cs_now: 27c04b4cd6 cs_last: 276a723d44 mask: ffffffffffffffff [ 60.696052] clocksource: 'kvm-clock' (not 'tsc') is current clocksource. [ 60.697142] tsc: Marking TSC unstable due to clocksource watchdog The test of Comment 21 are on RHEL 9.2 (kernel-5.14.0-268.el9.x86_64 && qemu-kvm-7.2.0-10.el9.x86_64 && edk2-ovmf-20221207gitfff6d81270b5-7.el9.noarch) OK, thanks for testing; that's great to know that fixes it. I don't think there's a libvirt way to do that yet, I've filed: https://gitlab.com/libvirt/libvirt/-/issues/450#note_1289883756 to ask for it. > So what's the difference between setting:
> host-phys-bits=off,phys-bits=39
> and
> host-phys-bits-limit=39
host-phys-bits=off,phys-bits=x allows to set any value you like, including values larger than what the host supports (and with tcg that actually works, with kvm it doesn't of course).
host-phys-bits=on,host-phys-bits-limit=x allows only values the host can actually handle and throws an error otherwise.
> host-phys-bits=on,host-phys-bits-limit=x allows only values the host can
> actually handle and throws an error otherwise.
Correction: Doesn't throw errors, but uses min(limit,supported).
(In reply to Guo, Zhiyi from comment #16) > I think we introduced a big change that breaks the original behavior when > performing VM live migration on hosts with different physical address spaces > and I'm not sure how this is handled gracefully from layer product. I'd say we introduced a change that exposed something that was already pretty broken before :) > I'm CC Daniel and Germano to see if they have any comments about this IIUC, we need to do the following: 1. [Bug/RFE] That virsh hypervisor-cpu-baseline *should* tell the customer about the address space difference and generate a definition that allows migrating back and forth. 2. [Docs] On RHEL 9.2+ KVM, we need to document this for customers doing live migrations, explain that phys-bits need to be set when using hosts with different address space sizes. This may depend on item 1 above. 3. [Internal] Let layered products know that they need to generate specific XML depending on the hosts that are in the cluster (i.e. lowest common denominator). 4. [KCS] For RHEL KVM, document the error and explain whats going on and how to fix. Hi Nitesh, Per comment 23 & 26, can you change the component of this bug to libvirt? Thanks! Zhiyi Hi dgilbert, As talked above, host-phys-bits is on by default. And we shall use host-phys-bits-limit=x when migrating between hosts that have different address sizes, right? Per my Comment 3, I'm not clear why migration succeeds when migrating from Skylake to Cascade Lake machine? (In reply to Li Xiaohui from comment #3) > Using yalan's hosts, do plain migration from Cascade Lake (Silver 4214) to > Skylake (E3-1260L v5), dst qemu would crash when migration completed, same > error as Description. > > Also tried to migrate from Skylake to Cascade Lake, but migration succeeds, > and qemu works well. > > > > Now loan one Icelake and one Haswell machine, will try again to see if bug > only happens on special machines or happens when migrating from new to old > cpu machines. (In reply to Li Xiaohui from comment #29) > Hi dgilbert, > > As talked above, host-phys-bits is on by default. > And we shall use host-phys-bits-limit=x when migrating between hosts that > have different address sizes, right? Right, that should work (but there's no mechanism via libvirt yet) > Per my Comment 3, I'm not clear why migration succeeds when migrating from > Skylake to Cascade Lake machine? > > (In reply to Li Xiaohui from comment #3) > > Using yalan's hosts, do plain migration from Cascade Lake (Silver 4214) to > > Skylake (E3-1260L v5), dst qemu would crash when migration completed, same > > error as Description. > > > > Also tried to migrate from Skylake to Cascade Lake, but migration succeeds, > > and qemu works well. > > > > > > > > Now loan one Icelake and one Haswell machine, will try again to see if bug > > only happens on special machines or happens when migrating from new to old > > cpu machines. Moving this to libvirt. We discussed this BZ in today's live migration bi-weekly sync, and the consensus was to fix it in 9.3 and defer the 9.2 z-stream for now. That is mainly because it is unlikely that a customer will migrate between a larger Xeon and E3 and hence they may not run into this. If we do get a bug on 9.2, as that's what CNV will consume, we can request a z-stream based on that. @Germano, do you have any objections? > Unfortunately yes. Many customers have heterogeneous clusters with different
> CPU models, we see it all the time and you may remember recent escalations
> came from clusters having different CPU models.
That sounds like it might be a good idea to back out the edk2 change until
libvirt is ready to handle it. It's not much effort to do, effectively
a one-line change to force traditional (9.1-style) behavior.
If you want this please open two bugs against edk2:
* one to turn it off, for 9.2-ga, and
* one to turn it back on, for 9.3 (or 9.2.z),
with a dependency on the libvirt update.
I'm confused about the scope of this bug here. I got it assigned because I started writing the patches to expose the host-phys-bits-limit option in libvirt: https://gitlab.com/libvirt/libvirt/-/issues/450 https://listman.redhat.com/archives/libvir-list/2023-March/238231.html I'm not familiar enough with hypervisor-cpu-baseline to say whether libvirt even has the information needed to calculate the right limit. I filed bug 2176215 for the host-phys-bits knob. The two bugs depending on this, directly and indirectly, have high priority (bug 2174749, bug 2055123). So raising priority for this bug too. Ping. No ITM set yet ... What is the plan for this? I'd like to see this being fixed early in the 9.3 devel cycle so we have enough time handle the depennding bugs and to test this. Ping, ping. Still no ITM set yet ... Well, exposing host-phys-bits-limit option through libvirt is already upstream in libvirt-9.3.0 and will be included in the coming rebase of libvirt for RHEL-9.3.0 (see bug 2176215). The following XML <cpu ...> <maxphysaddr mode='passthrough' limit='39'/> </cpu> translates to host-phys-bits=on,host-phys-bits-limit=39 And the following XML can be used to set host-phys-bits=off: <cpu ...> <maxphysaddr mode="emulate"/> </cpu> So I believe that BZ should be enough for the two bugs from comment #51. Maybe they should actually depend on bug 2176215 rather than this one? What do you think Gerd? So the question is what this BZ is supposed to be about. The hypervisor-cpu-baseline command (or the corresponding API) provides a single CPU model and features for a given set of host CPU models, but it focuses on the common set of CPU features because the input data from domcapabilities XML does not provide anything else. We could perhaps add the host-phys-bits info in a some way there (I think it should be doable, but I haven't checked for sure) and output the lowest value in the returned baseline CPU definition, if this is what is requested by this BZ. Or should we just document how to get the correct value from host capabilities XML where it should already be reported? > So the question is what this BZ is supposed to be about.
>
> The hypervisor-cpu-baseline command (or the corresponding API) provides a
> single CPU model and features for a given set of host CPU models, but it
> focuses on the common set of CPU features because the input data from
> domcapabilities XML does not provide anything else. We could perhaps add the
> host-phys-bits info in a some way there (I think it should be doable, but I
> haven't checked for sure) and output the lowest value in the returned
> baseline
> CPU definition, if this is what is requested by this BZ.
Yes, I think that would be needed. As far I know hypervisor-cpu-baseline
is what management tools (RHV, OpenShift, ...) are using to get a migratable
cpu configuration. If we include phys-bits there this should fix the
migration problem without needing changes higher up in the management
stack.
I think RHV uses a predefined static set of CPU models which an administrator can choose from to set a cluster level CPU model and Openstack uses host-model by default. Unless something changed recently of course. So management tools will need some changes too. (In reply to Jiri Denemark from comment #56) > I think RHV uses a predefined static set of CPU models which an administrator > can choose from to set a cluster level CPU model and Openstack uses > host-model > by default. Unless something changed recently of course. So management tools > will need some changes too. RHV is stuck on 8.6, so I think it is safe? But we should notify CNV and OSP about this. Patches sent upstream for review: https://listman.redhat.com/archives/libvir-list/2023-June/240313.html Pushed upstream as commit be1b7d5b18e69a7000b93dad92d05105709afc43 Refs: v9.4.0-64-gbe1b7d5b18 Author: Jiri Denemark <jdenemar> AuthorDate: Fri Jun 9 17:17:36 2023 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Fri Jun 16 12:44:54 2023 +0200 qemu: Report physical address size in domain capabilities We already report the hosts physical address size in host capabilities, but computing a baseline CPU definition is done from domain capabilities. Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Michal Privoznik <mprivozn> commit ce6d1dca6d9720e6dcb4e74a84550c2326a7c494 Refs: v9.4.0-65-gce6d1dca6d Author: Jiri Denemark <jdenemar> AuthorDate: Fri Jun 9 18:12:53 2023 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Fri Jun 16 12:44:54 2023 +0200 qemu: Include maximum physical address size in baseline CPU The current implementation of virConnectBaselineHypervisorCPU in QEMU driver can provide a CPU definition that will not work on all hosts in case they have different maximum physical address size. So when we get the info from domain capabilities, we need to choose the smallest physical address size for the computed baseline CPU definition. https://bugzilla.redhat.com/show_bug.cgi?id=2171860 Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Michal Privoznik <mprivozn> Reproduce this bug without set <maxphysaddr mode='passthrough' limit='39'/> on libvirt-9.5.0-5.el9.x86_64: 1. prepare a running guest which didn't set maxphysaddr limit and physical address size > target host physical address size GUEST: # lscpu ... Address sizes: 46 bits physical, 48 bits virtual SRC HOST: # lscpu ... Address sizes: 46 bits physical, 57 bits virtual TGT HOST: # lscpu ... Address sizes: 39 bits physical, 48 bits virtual 2. migrate guest to target host # virsh migrate vm1 qemu+ssh://tgthostname/system --live --p2p error: operation failed: job 'migration out' unexpectedly failed 3. check guest log in target host: /var/log/libvirt/qemu/vm1.log: 2023-08-09T07:10:32.081065Z qemu-kvm: error: failed to set MSR 0x202 to 0x380000000000 qemu-kvm: ../target/i386/kvm/kvm.c:3292: int kvm_buf_set_msrs(X86CPU *): Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Test this bug with set <maxphysaddr mode='passthrough' limit='39'/> on libvirt-9.5.0-5.el9.x86_64: S1: Test domcapabilities cpu mode element include correct maxphysaddr value 1. # virsh domcapabilities ... <mode name='host-model' supported='yes'> <model fallback='forbid'>SapphireRapids</model> <vendor>Intel</vendor> <maxphysaddr mode='passthrough' limit='46'/> ... 2. # lscpu Address sizes: 46 bits physical, 57 bits virtual S2: Test hypervisor-cpu-baseline can generate correct maxphysaddr element 1. 2 hosts' domcaps file include 2 different physical address size # cat domcaps.xml |grep maxphysaddr <maxphysaddr mode='passthrough' limit='46'/> <maxphysaddr mode='passthrough' limit='39'/> # virsh hypervisor-cpu-baseline domcaps.xml |grep maxphysaddr <maxphysaddr mode='passthrough' limit='39'/> 2. 3 hosts' domcaps file include 3 different physical address size # cat domcaps3.xml |grep maxphysaddr <maxphysaddr mode='passthrough' limit='46'/> <maxphysaddr mode='passthrough' limit='39'/> <maxphysaddr mode='passthrough' limit='43'/> # virsh hypervisor-cpu-baseline domcaps3.xml |grep maxphysaddr <maxphysaddr mode='passthrough' limit='39'/> 3. 2 hosts' domcaps file but only have 1 physical address size # cat domcaps.xml |grep maxphysaddr <maxphysaddr mode='passthrough' limit='46'/> # virsh hypervisor-cpu-baseline domcaps.xml |grep maxphysaddr <maxphysaddr mode='passthrough' limit='46'/> 4. 2 hosts' domcaps file but no maxphysaddr element # cat domcaps.xml |grep maxphysaddr # virsh hypervisor-cpu-baseline domcaps.xml |grep maxphysaddr S3: Test migration between 2 host which have different physical address size 1. collect domcapabilities output from both source and target host: # vim domcaps.xml # virsh domcapabilities >> domcaps.xml # cat domcaps.xml |grep maxphysaddr <maxphysaddr mode='passthrough' limit='46'/> <maxphysaddr mode='passthrough' limit='39'/> 2. use hypervisor-cpu-baseline get cpu element xml for migration: # virsh hypervisor-cpu-baseline domcaps.xml <cpu mode='custom' match='exact'> <model fallback='forbid'>Skylake-Client-IBRS</model> <vendor>Intel</vendor> <maxphysaddr mode='passthrough' limit='39'/> <feature policy='require' name='ss'/> <feature policy='require' name='vmx'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='clflushopt'/> <feature policy='require' name='umip'/> <feature policy='require' name='md-clear'/> <feature policy='require' name='stibp'/> <feature policy='require' name='flush-l1d'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='ssbd'/> <feature policy='require' name='xsaves'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='invtsc'/> <feature policy='require' name='ibpb'/> <feature policy='require' name='ibrs'/> <feature policy='require' name='amd-stibp'/> <feature policy='require' name='amd-ssbd'/> <feature policy='require' name='skip-l1dfl-vmentry'/> <feature policy='require' name='pschange-mc-no'/> <feature policy='disable' name='hle'/> <feature policy='disable' name='rtm'/> <feature policy='disable' name='mpx'/> </cpu> 3. modify guest inactive xml and update cpu element: # virsh edit vm1 ... <cpu mode='custom' match='exact' check='partial'> <model fallback='forbid'>Skylake-Client-IBRS</model> <vendor>Intel</vendor> <maxphysaddr mode='passthrough' limit='39'/> <feature policy='require' name='ss'/> <feature policy='require' name='vmx'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='clflushopt'/> <feature policy='require' name='umip'/> <feature policy='require' name='md-clear'/> <feature policy='require' name='stibp'/> <feature policy='require' name='flush-l1d'/> <feature policy='require' name='arch-capabilities'/> <feature policy='require' name='ssbd'/> <feature policy='require' name='xsaves'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='ibpb'/> <feature policy='require' name='ibrs'/> <feature policy='require' name='amd-stibp'/> <feature policy='require' name='amd-ssbd'/> <feature policy='require' name='skip-l1dfl-vmentry'/> <feature policy='require' name='pschange-mc-no'/> <feature policy='disable' name='hle'/> <feature policy='disable' name='rtm'/> <feature policy='disable' name='mpx'/> </cpu> ... 4. start guest # virsh start vm1 Domain 'vm1' started 5. login guest and check physical address size IN GUEST: # lscpu ... Address sizes: 39 bits physical, 48 bits virtual 6. migrate guest to target host: # virsh migrate vm1 qemu+ssh://tgthostname/system --live --p2p --verbose Migration: [100.00 %] 7. login guest and check guest os and fs work as expected 8. migrate back to source host # virsh migrate vm1 qemu+ssh://srchostname/system --live --p2p --verbose Migration: [100.00 %] 9. login guest and check guest os and fs work as expected Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6409 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |