Output from libguestfs-test-tool: [root@compose-rawhide01 ~][PROD-IAD2]# LIBGUESTFS_BACKEND=direct libguestfs-test-tool ************************************************************ * IMPORTANT NOTICE * * When reporting bugs, include the COMPLETE, UNEDITED * output below in your bug report. * ************************************************************ LIBGUESTFS_BACKEND=direct PATH=/root/.local/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin XDG_RUNTIME_DIR=/run/user/0 SELinux: Enforcing guestfs_get_append: (null) guestfs_get_autosync: 1 guestfs_get_backend: direct guestfs_get_backend_settings: [] guestfs_get_cachedir: /var/tmp guestfs_get_hv: /usr/bin/qemu-kvm guestfs_get_memsize: 1280 guestfs_get_network: 0 guestfs_get_path: /usr/lib64/guestfs guestfs_get_pgroup: 0 guestfs_get_program: libguestfs-test-tool guestfs_get_recovery_proc: 1 guestfs_get_smp: 1 guestfs_get_sockdir: /tmp guestfs_get_tmpdir: /tmp guestfs_get_trace: 0 guestfs_get_verbose: 1 host_cpu: x86_64 Launching appliance, timeout set to 600 seconds. libguestfs: launch: program=libguestfs-test-tool libguestfs: launch: version=1.46.0fedora=35,release=1.fc35,libvirt libguestfs: launch: backend registered: direct libguestfs: launch: backend registered: libvirt libguestfs: launch: backend registered: uml libguestfs: launch: backend registered: unix libguestfs: launch: backend=direct libguestfs: launch: tmpdir=/tmp/libguestfsk756Jb libguestfs: launch: umask=0022 libguestfs: launch: euid=0 libguestfs: begin building supermin appliance libguestfs: run supermin libguestfs: command: run: /usr/bin/supermin libguestfs: command: run: \ --build libguestfs: command: run: \ --verbose libguestfs: command: run: \ --if-newer libguestfs: command: run: \ --lock /var/tmp/.guestfs-0/lock libguestfs: command: run: \ --copy-kernel libguestfs: command: run: \ -f ext2 libguestfs: command: run: \ --host-cpu x86_64 libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d libguestfs: command: run: \ -o /var/tmp/.guestfs-0/appliance.d supermin: version: 5.3.1 supermin: rpm: detected RPM version 4.17 supermin: rpm: detected RPM architecture x86_64 supermin: package handler: fedora/rpm supermin: acquiring lock on /var/tmp/.guestfs-0/lock supermin: if-newer: output does not need rebuilding libguestfs: finished building supermin appliance libguestfs: begin testing qemu features libguestfs: checking for previously cached test results of /usr/bin/qemu-kvm, in /var/tmp/.guestfs-0 libguestfs: loading previously cached test results libguestfs: qemu version: 6.1 libguestfs: qemu mandatory locking: yes libguestfs: qemu KVM: enabled libguestfs: finished testing qemu features /usr/bin/qemu-kvm \ -global virtio-blk-pci.scsi=off \ -no-user-config \ -nodefaults \ -display none \ -machine accel=kvm:tcg,graphics=off \ -cpu max \ -m 1280 \ -no-reboot \ -rtc driftfix=slew \ -no-hpet \ -global kvm-pit.lost_tick_policy=discard \ -kernel /var/tmp/.guestfs-0/appliance.d/kernel \ -initrd /var/tmp/.guestfs-0/appliance.d/initrd \ -object rng-random,filename=/dev/urandom,id=rng0 \ -device virtio-rng-pci,rng=rng0 \ -device virtio-scsi-pci,id=scsi \ -drive file=/tmp/libguestfsk756Jb/scratch1.img,cache=unsafe,format=raw,id=hd0,if=none \ -device scsi-hd,drive=hd0 \ -drive file=/var/tmp/.guestfs-0/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none \ -device scsi-hd,drive=appliance \ -device virtio-serial-pci \ -serial stdio \ -chardev socket,path=/tmp/libguestfsl7pJlc/guestfsd.sock,id=channel0 \ -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \ -append "panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check pr intk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=U UID=e42f185a-b4a2-4864-95b9-37bd7186fab3 selinux=0 guestfs_verbose=1 TERM=screen" qemu-kvm: error: failed to set MSR 0x345 to 0x2000 qemu-kvm: ../target/i386/kvm/kvm.c:2833: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. libguestfs: error: appliance closed the connection unexpectedly, see earlier error messages libguestfs: child_cleanup: 0x5587c998cd20: child process died libguestfs: sending SIGTERM to process 1272 libguestfs: error: /usr/bin/qemu-kvm killed by signal 6 (Aborted), see debug messages above libguestfs: error: guestfs_launch failed, see earlier error messages libguestfs: closing guestfs handle 0x5587c998cd20 (state 0) libguestfs: command: run: rm libguestfs: command: run: \ -rf /tmp/libguestfsk756Jb libguestfs: command: run: rm libguestfs: command: run: \ -rf /tmp/libguestfsl7pJlc This is the fedora rawhide composer vm. I just upgraded it from f34 where it was working fine to f35. The host is a rhel8.5 box running kernel-4.18.0-348.el8.x86_64 with 'options kvm_intel nested=1' set. /proc/cpuinfo has: ... processor : 95 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz stepping : 7 microcode : 0x5003102 cpu MHz : 3700.000 cache size : 36608 KB physical id : 1 siblings : 48 core id : 27 cpu cores : 24 apicid : 119 initial apicid : 119 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa itlb_multihit bogomips : 4205.22 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: The guest cpuinfo has: processor : 15 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 15 siblings : 1 core id : 0 cpu cores : 1 apicid : 15 initial apicid : 15 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:
Nested KVM is unfortunately known to be very flaky. Adding a few people who work in this area.
Could you please also give QEMU command line from L0 and /proc/cpuinfo from L1 (in case the one from https://bugzilla.redhat.com/show_bug.cgi?id=2022075#c0 corresponds to L0).
Definitely agree we'd want to see the L0 qemu command and the other information. Also be good to have exact kernel and qemu versions of each layer. I believe the scenario is: L0 : RHEL 8.5 (AV or non-AV?) L1 : Fedora 35 <-- qemu command from comment 0 runs here L2 : libguestfs appliance with same Fedora 35 kernel as L1
This assertion failure is really strange; it implies that the kernel set *more* MSRs than what QEMU requested. When the kernel sets *fewer* than requested, QEMU logs an error but continues otherwise fine. This looks like a misunderstanding on MSRs between KVM and QEMU.
Thanks for all the quick replies here. :) In the mean time last night I downgraded the guest to f34's qemu and that got everything working fine, so that makes it sound to me like qemu is the issue here. ;) (In reply to Vitaly Kuznetsov from comment #2) > Could you please also give QEMU command line from L0 and /proc/cpuinfo from qemu 247996 24.5 33.9 136310620 133897228 ? Sl Nov10 377:28 /usr/libexec/qemu-kvm -name guest=compose-rawhide01.iad2.fedoraproject.org,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-compose-rawhide01.ia/master-key.aes -machine pc-q35-rhel8.2.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu Cascadelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off -m 131072 -overcommit mem-lock=off -smp 16,maxcpus=80,sockets=80,cores=1,threads=1 -uuid 7864f4a9-499d-4cee-bebd-31c4ca735fad -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device pcie-pci-bridge,id=pci.8,bus=pci.1,addr=0x0 -device pcie-root-port,port=0x17,chassis=9,id=pci.9,bus=pcie.0,addr=0x2.0x7 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 -blockdev {"driver":"host_device","filename":"/dev/vg_guests/compose-rawhide01.iad2.fedoraproject.org","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"} -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=49 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:57:74:7c,bus=pci.2,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=50,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5906,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device i6300esb,id=watchdog0,bus=pci.8,addr=0x1 -watchdog-action reset -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on > L1 (in case the one from > https://bugzilla.redhat.com/show_bug.cgi?id=2022075#c0 corresponds to L0). processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 1 siblings : 1 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 2 siblings : 1 core id : 0 cpu cores : 1 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 3 siblings : 1 core id : 0 cpu cores : 1 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 4 siblings : 1 core id : 0 cpu cores : 1 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 5 siblings : 1 core id : 0 cpu cores : 1 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 6 siblings : 1 core id : 0 cpu cores : 1 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 7 siblings : 1 core id : 0 cpu cores : 1 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 8 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 8 siblings : 1 core id : 0 cpu cores : 1 apicid : 8 initial apicid : 8 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 9 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 9 siblings : 1 core id : 0 cpu cores : 1 apicid : 9 initial apicid : 9 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 10 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 10 siblings : 1 core id : 0 cpu cores : 1 apicid : 10 initial apicid : 10 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 11 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 11 siblings : 1 core id : 0 cpu cores : 1 apicid : 11 initial apicid : 11 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 12 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 12 siblings : 1 core id : 0 cpu cores : 1 apicid : 12 initial apicid : 12 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 13 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 13 siblings : 1 core id : 0 cpu cores : 1 apicid : 13 initial apicid : 13 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 14 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 14 siblings : 1 core id : 0 cpu cores : 1 apicid : 14 initial apicid : 14 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 15 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel Xeon Processor (Cascadelake) stepping : 6 microcode : 0x1 cpu MHz : 2095.076 cache size : 16384 KB physical id : 15 siblings : 1 core id : 0 cpu cores : 1 apicid : 15 initial apicid : 15 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa bogomips : 4190.15 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: (In reply to Richard W.M. Jones from comment #3) > Definitely agree we'd want to see the L0 qemu command and the other > information. > Also be good to have exact kernel and qemu versions of each layer. > > I believe the scenario is: > > L0 : RHEL 8.5 (AV or non-AV?) Yes, 8.5 (not sure what AV is in this context?) > L1 : Fedora 35 <-- qemu command from comment 0 runs here Yes > L2 : libguestfs appliance with same Fedora 35 kernel as L1 yes. Or more importantly qemu from pungi for making images. L0: qemu-kvm-4.2.0-59.module+el8.5.0+12817+cb650d43.x86_64 kernel-4.18.0-348.el8.x86_64 L1: kernel-5.14.16-301.fc35.x86_64 qemu-kvm-6.1.0-10.fc35 (broken) qemu-kvm-5.2.0-8.fc34.x86_64 (works)
Thanks for the info! QEMU's error: "qemu-kvm: error: failed to set MSR 0x345 to 0x2000" is likely the culprit. MSR 0x345 is MSR_IA32_PERF_CAPABILITIES. '0x2000' is 'full width counting'. Support for the feature was added in Linux-5.8 (see commit full width counting) and QEMU-5.1 (see commit ea39f9b643959). 'pdcm' flag in /proc/cpuinfo indicates the presence of the feature and as we can see, both L0 and L1 have it. Looking at KVM code, write to MSR_IA32_PERF_CAPABILITIES is denied when guest (L2 in our case) CPU doesn't have X86_FEATURE_PDCM exposed. L2's QEMU command like looks like "-cpu max" Maybe this doesn't expose pdcm? How hard would it be to modify this to '-cpu max,pdcm=on' ?
(In reply to Vitaly Kuznetsov from comment #6) > Maybe this doesn't expose pdcm? How hard would it be to modify this to '-cpu > max,pdcm=on' ? That doesn't make sense as a question. The 'max' CPU model is defined & implemented as exposing *all* the features supported by the accelerator (kvm or tcg). In KVM case 'max' is identical to 'host'. In TCG case 'max' is simply everything TCG supports. So if some feature isn't exposed, it means it is either not implemented in TCG, or not available from the KVM kmod.
(In reply to Vitaly Kuznetsov from comment #6) > Maybe this doesn't expose pdcm? How hard would it be to modify this to '-cpu > max,pdcm=on' ? To answer just this question, you can't test it with libguestfs directly, but one way to test it would be to run the following commands (in L1): $ libguestfs-test-tool $ cd /var/tmp/.guestfs-`id -u`/appliance.d $ qemu-system-x86_64 -no-user-config -nodefaults -display none -no-reboot -machine accel=kvm -cpu max -m 1280 -kernel ./kernel -initrd ./initrd -append 'panic=1 console=ttyS0' -serial stdio vs $ qemu-system-x86_64 -no-user-config -nodefaults -display none -no-reboot -machine accel=kvm -cpu max,pdcm=on -m 1280 -kernel ./kernel -initrd ./initrd -append 'panic=1 console=ttyS0' -serial stdio The first one is expected to fail with the MSRs assert fail. If the second one starts to run the guest kernel, that would indicate that the problem is fixed by adding pdcm=on.
Both fail: # qemu-system-x86_64 -no-user-config -nodefaults -display none -no-reboot -machine accel=kvm -cpu max -m 1280 -kernel ./kernel -initrd ./initrd -append 'panic=1 console=ttyS0' -serial stdio qemu-system-x86_64: error: failed to set MSR 0x345 to 0x2000 qemu-system-x86_64: ../target/i386/kvm/kvm.c:2833: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Aborted (core dumped) # qemu-system-x86_64 -no-user-config -nodefaults -display none -no-reboot -machine accel=kvm -cpu max,pdcm=on -m 1280 -kernel ./kernel -initrd ./initrd -append 'panic=1 console=ttyS0' -serial stdio qemu-system-x86_64: error: failed to set MSR 0x345 to 0x2000 qemu-system-x86_64: ../target/i386/kvm/kvm.c:2833: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Aborted (core dumped) FWIW, the backtrack on the core is: Stack trace of thread 415021: #0 0x00007fe6bfb5585c __pthread_kill_implementation (libc.so.6 + 0x8f85c) #1 0x00007fe6bfb086b6 raise (libc.so.6 + 0x426b6) #2 0x00007fe6bfaf27d3 abort (libc.so.6 + 0x2c7d3) #3 0x00007fe6bfaf26fb __assert_fail_base.cold (libc.so.6 + 0x2c6fb) #4 0x00007fe6bfb013a6 __assert_fail (libc.so.6 + 0x3b3a6) #5 0x000055da10105804 kvm_buf_set_msrs (qemu-system-x86_64 + 0x52b804) #6 0x000055da10107e84 kvm_arch_init_vcpu (qemu-system-x86_64 + 0x52de84) #7 0x000055da102688f3 kvm_init_vcpu (qemu-system-x86_64 + 0x68e8f3) #8 0x000055da1026cb89 kvm_vcpu_thread_fn (qemu-system-x86_64 + 0x692b89) #9 0x000055da103b2033 qemu_thread_start (qemu-system-x86_64 + 0x7d8033) #10 0x00007fe6bfb53b17 start_thread (libc.so.6 + 0x8db17) #11 0x00007fe6bfbd86c0 __clone3 (libc.so.6 + 0x1126c0) Stack trace of thread 415017: #0 0x00007fe6bfb5077a __futex_abstimed_wait_common (libc.so.6 + 0x8a77a) #1 0x00007fe6bfb52ef0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8cef0) #2 0x000055da103b246d qemu_cond_wait_impl (qemu-system-x86_64 + 0x7d846d) #3 0x000055da10189417 qemu_init_vcpu (qemu-system-x86_64 + 0x5af417) #4 0x000055da1014011f x86_cpu_realizefn (qemu-system-x86_64 + 0x56611f) #5 0x000055da1028bc2d device_set_realized (qemu-system-x86_64 + 0x6b1c2d) #6 0x000055da1028e79a property_set_bool (qemu-system-x86_64 + 0x6b479a) #7 0x000055da1029153c object_property_set (qemu-system-x86_64 + 0x6b753c) #8 0x000055da10294b94 object_property_set_qobject (qemu-system-x86_64 + 0x6bab94) #9 0x000055da10291b79 object_property_set_bool (qemu-system-x86_64 + 0x6b7b79) #10 0x000055da10117005 x86_cpu_new (qemu-system-x86_64 + 0x53d005) #11 0x000055da101170ee x86_cpus_init (qemu-system-x86_64 + 0x53d0ee) #12 0x000055da1011b53d pc_init1.constprop.0 (qemu-system-x86_64 + 0x54153d) #13 0x000055da1000b263 machine_run_board_init (qemu-system-x86_64 + 0x431263) #14 0x000055da101a5a09 qmp_x_exit_preconfig.part.0 (qemu-system-x86_64 + 0x5cba09) #15 0x000055da101a9737 qemu_init (qemu-system-x86_64 + 0x5cf737) #16 0x000055da0ff33e7d main (qemu-system-x86_64 + 0x359e7d) #17 0x00007fe6bfaf3560 __libc_start_call_main (libc.so.6 + 0x2d560) #18 0x00007fe6bfaf360c __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2d60c) #19 0x000055da0ff37bc5 _start (qemu-system-x86_64 + 0x35dbc5) Stack trace of thread 415018: #0 0x00007fe6bfb9e3b5 clock_nanosleep.5 (libc.so.6 + 0xd83b5) #1 0x00007fe6bfba2fb7 __nanosleep (libc.so.6 + 0xdcfb7) #2 0x00007fe6c002f6a7 g_usleep (libglib-2.0.so.0 + 0x7f6a7) #3 0x000055da103bbbb2 call_rcu_thread (qemu-system-x86_64 + 0x7e1bb2) #4 0x000055da103b2033 qemu_thread_start (qemu-system-x86_64 + 0x7d8033) #5 0x00007fe6bfb53b17 start_thread (libc.so.6 + 0x8db17) #6 0x00007fe6bfbd86c0 __clone3 (libc.so.6 + 0x1126c0) Happy to gather additional info or try more things. Thanks.
The weird thing here is that 'pdcm' shouldn't be in L1 cpu flags. QEMU command line for L1 is: ... cpu Cascadelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off ... and while we see explicit 'pdcm=on', QEMU has the following code cpu_x86_cpuid(): case 1: ... if (!cpu->enable_pmu) { *ecx &= ~CPUID_EXT_PDCM; } ... 'enable_pmu' is false by default and there's no 'pmu=on' on the command line above. What's even more weird, that QEMU actually works as expected for me. Without "pmu=on", there's no 'pdcm' in guest's /proc/cpuinfo even with explicit 'pdcm=on' (our case). I'm certainly missing something important here to be able to reproduce the problem. What's the exact RHEL8.5 QEMU version in L0? I'll try to recreate the exact setup then.
qemu-kvm-4.2.0-59.module+el8.5.0+12817+cb650d43.x86_64
Oh, I see, it's not from RHEL-AV (advanced virtualization), it's from plain RHEL. Sad story is: QEMU gained support for 'pdcm' feature bit long time ago (e117f7725af84) but it wasn't until QEMU-5.1 when support for MSR_IA32_PERF_CAPABILITIES was added (ea39f9b643) so when qemu-4.2 is used to create L1, MSR_IA32_PERF_CAPABILITIES MSR is actually unsupported but the feature bit for it is set. Newer QEMU in L1 tries to add MSR_IA32_PERF_CAPABILITIES for L2 but KVM in L1 wants to access non-existent MSR_IA32_PERF_CAPABILITIES. Now the question is what can we do about it. Immediate solutions: 1) Drop 'pdcm=on' from QEMU command line in L1 2) Add 'pdcm=off' to QEMU command line in L2 3) Use newer QEMU (probably from 'advanced virt' module) in L0. No.3 actually makes a lot of sense as nested virt can be broken in multiple other places with QEMU-4.2, afaiu nobody probably tests it. What I'm still puzzled about is why 'qemu-kvm-5.2.0-8.fc34.x86_64' in L1 works. Are you using the same kernel in L1?
(In reply to Vitaly Kuznetsov from comment #12) > Oh, I see, it's not from RHEL-AV (advanced virtualization), it's from plain > RHEL. Yeah. In the past we used qemu from some other channel, but I thought it didn't matter in 8 anymore. I guess it does. ;( > Sad story is: QEMU gained support for 'pdcm' feature bit long time ago > (e117f7725af84) > but it wasn't until QEMU-5.1 when support for MSR_IA32_PERF_CAPABILITIES was > added (ea39f9b643) > so when qemu-4.2 is used to create L1, MSR_IA32_PERF_CAPABILITIES MSR is > actually unsupported > but the feature bit for it is set. Newer QEMU in L1 tries to add > MSR_IA32_PERF_CAPABILITIES for > L2 but KVM in L1 wants to access non-existent MSR_IA32_PERF_CAPABILITIES. > > Now the question is what can we do about it. Immediate solutions: > > 1) Drop 'pdcm=on' from QEMU command line in L1 > 2) Add 'pdcm=off' to QEMU command line in L2 > 3) Use newer QEMU (probably from 'advanced virt' module) in L0. > > No.3 actually makes a lot of sense as nested virt can be broken in multiple > other places with QEMU-4.2, > afaiu nobody probably tests it. Yeah, ok, I can look at moving to that. > What I'm still puzzled about is why 'qemu-kvm-5.2.0-8.fc34.x86_64' in L1 > works. Are you using the same > kernel in L1? L1 is f35: 5.14.16-301.fc35.x86_64
I've switched the L0/rhel8.5/hypervisor box to use the virt packages from the advanced virt 8.3 stream and can confirm that the L1 guest passes libguestfs-test fine now. Will see how it does tonight with a rawhide compose.
This message is a reminder that Fedora Linux 35 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '35'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 35 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13. Fedora Linux 35 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.