Bug 1986665

Summary: [Fwcfg64] dump-guest-memory -w command report error "win-dump: failed to read CPU #2 ContextFrame location" on Windows desktop
Product: Red Hat Enterprise Linux 9 Reporter: Peixiu Hou <phou>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: General QA Contact: leidwang <leidwang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: ailan, lijin, lmiksik, mdean, menli, mrezanin, qizhu, virt-maint, vrozenfe, yvugenfi
Version: 9.0Keywords: Triaged
Target Milestone: rc   
Target Release: 9.0   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: qemu-kvm-7.2.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 07:19:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2135806    
Bug Blocks: 1972056    

Description Peixiu Hou 2021-07-28 03:30:39 UTC
Description of problem:
On RHEL8.5.0 host, boot a win8.1-64 vm with -device vmcoreinfo, and install fwcfg64 driver, run follows command:
(qemu) dump-guest-memory -w 81-mem-205.dmp
Error: win-dump: failed to read CPU #2 ContextFrame location

Version-Release number of selected component (if applicable):
kernel-4.18.0-323.el8.x86_64
qemu-kvm-6.0.0-25.module+el8.5.0+11890+8e7c3f51.x86_64
seabios-bin-1.14.0-1.module+el8.4.0+8855+a9e237a9.noarch
virtio-win-prewhql-205

How reproducible:
100%

Steps to Reproduce:
1. Boot Win8.1-64 vm up with qemu commands:
/usr/libexec/qemu-kvm \
        -name 205FWC816435JHL -enable-kvm -m 6G -smp 4 \
        -uuid 95040007-ee71-420d-bb48-b699bd247588 -nodefaults \
        -cpu EPYC,hv_stimer,hv_synic,hv_time,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,-xsave \
        -chardev socket,id=charmonitor,path=/tmp/205FWC816435JHL,server,nowait \
        -mon chardev=charmonitor,id=monitor,mode=control \
        -rtc base=localtime,driftfix=slew \
        -boot order=cd,menu=on -device piix3-usb-uhci,id=usb \
        -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=205FWC816435JHL,node-name=my_file \
        -blockdev driver=raw,node-name=my,file=my_file \
        -device ide-hd,drive=my,id=ide0-0-0,bus=ide.0,unit=0 \
        -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/kvm_autotest_root/iso/ISO/Win8u1/en_windows_8.1_enterprise_with_update_x64_dvd_6054382.iso,node-name=my_cd,read-only=on \
        -blockdev driver=raw,node-name=mycd,file=my_cd,read-only=on \
        -device ide-cd,drive=mycd,id=ide0-1-0,bus=ide.1,unit=0 \
        -cdrom /home/kvm_autotest_root/iso/windows/virtio-win-prewhql-0.1-203.iso \
        -device usb-tablet,id=input0 \
        -vnc 0.0.0.0:2 -M q35 \
        -device pcie-root-port,bus=pcie.0,id=root1.0,multifunction=on,port=0x10,chassis=1,addr=0x7 \        -device pcie-root-port,bus=pcie.0,id=root2.0,port=0x11,chassis=2,addr=0x7.0x1 \
        -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 \
        -device e1000e,bus=root1.0,netdev=hostnet0,id=net0,mac=00:52:7a:63:be:ae \
        -vga std -device vmcoreinfo \
        -monitor stdio -qmp tcp:0:4445,server,nowait

2. Install fwcfg64 driver for vmcoreinfo device in vm.
3. (qemu) dump-guest-memory -w 81-mem-205.dmp  --> will report follows error:
Error: win-dump: failed to read CPU #2 ContextFrame location
4. Check the 81-mem-205.dmp file size in host, shown 0k.


Actual results:
command "dump-guest-memory -w 81-mem-205.dmp" report error

Expected results:
command "dump-guest-memory -w 81-mem-205.dmp" execute success, and Memory dump file can be saved normally.

Additional info:

1. Tested on RHEL9.0.0 host, with same steps, cannot reproduce this issue.
used versions:
kernel-5.13.0-0.rc7.51.el9.x86_64
qemu-kvm-6.0.0-8.el9.x86_64
seabios-bin-1.14.0-5.el9.noarch
virtio-win-prewhql-205

2. Tested with virtio-win-prewhql-205 + win2012-r2 & win2012-64, not reproduced this issue.
3. Tested with virito-win-prewhql-203 on RHEL8.5.0 host, also reproduced this issue.

Comment 12 Peixiu Hou 2021-11-11 05:40:07 UTC
Hi Vadim,

I Tested  this issue on RHE9.0 host, it can be reproduced. 

Versions:
kernel-5.14.0-9.el9.x86_64
qemu-kvm-6.1.0-5.el9.x86_64
seabios-bin-1.14.0-7.el9.noarch
virtio-win-prewhql-214

BTW, If fwcfg will be included in virtio-win rpm package of RHEL8.5.z? or it will be start included from RHEL8.6.0/RHEL9? 

Thanks~
Peixiu

Comment 13 Vadim Rozenfeld 2021-11-11 21:03:01 UTC
(In reply to Peixiu Hou from comment #12)
> Hi Vadim,
> 
> I Tested  this issue on RHE9.0 host, it can be reproduced. 
> 
> Versions:
> kernel-5.14.0-9.el9.x86_64
> qemu-kvm-6.1.0-5.el9.x86_64
> seabios-bin-1.14.0-7.el9.noarch
> virtio-win-prewhql-214
> 
> BTW, If fwcfg will be included in virtio-win rpm package of RHEL8.5.z? or it
> will be start included from RHEL8.6.0/RHEL9? 
> 
> Thanks~
> Peixiu

Thank you, Peixiu,

AFAIK there is no plan to release this driver in 8.5.z.
It probably will be done in 8.6/9.0 time frame.

Best regards,
Vadim.

Comment 14 Peixiu Hou 2021-11-12 01:13:15 UTC
(In reply to Vadim Rozenfeld from comment #13)
> (In reply to Peixiu Hou from comment #12)
> > Hi Vadim,
> > 
> > I Tested  this issue on RHE9.0 host, it can be reproduced. 
> > 
> > Versions:
> > kernel-5.14.0-9.el9.x86_64
> > qemu-kvm-6.1.0-5.el9.x86_64
> > seabios-bin-1.14.0-7.el9.noarch
> > virtio-win-prewhql-214
> > 
> > BTW, If fwcfg will be included in virtio-win rpm package of RHEL8.5.z? or it
> > will be start included from RHEL8.6.0/RHEL9? 
> > 
> > Thanks~
> > Peixiu
> 
> Thank you, Peixiu,
> 
> AFAIK there is no plan to release this driver in 8.5.z.
> It probably will be done in 8.6/9.0 time frame.
> 

Got it, thank you~

> Best regards,
> Vadim.

Comment 15 menli@redhat.com 2021-12-02 01:53:12 UTC
I test on win10(ovmf), hit the same issue, following are the details:

package info:
qemu-kvm-6.1.0-6.el9.x86_64
kernel-5.14.0-12.el9.x86_64
edk2-ovmf-20210527gite1999b264f1f-7.el9.noarch
seabios-bin-1.14.0-7.el9.noarch
virtio-win-prewhql-214

1. boot a guest with following command:

 /usr/libexec/qemu-kvm \
    -name "mouse-vm" \
    -machine q35,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars -nodefaults \
    -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \
    -device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \
    -device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \
    -device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \
    -device pcie-root-port,port=0x14,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3  \
    -device pcie-root-port,port=0x15,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4  \
    -device pcie-root-port,port=0x16,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5  \
    -device pcie-root-port,port=0x17,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \
    -device pcie-root-port,port=0x18,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \
    -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/123/win10-64-virtio-scsi.qcow2,node-name=drive_sys3 \
    -blockdev driver=qcow2,node-name=drive-virtio-disk0,file=drive_sys3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \
    -device scsi-hd,id=image1,drive=drive-virtio-disk0,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0  \
    -device virtio-net-pci,mac=9a:36:83:b6:3d:05,id=idJVpmsF,netdev=id23ZUK6  \
    -netdev tap,id=id23ZUK6,vhost=on \
    -vga std \
    -device vmcoreinfo \
    -blockdev node-name=file_ovmf_code,driver=file,filename=/usr/share/OVMF/OVMF_CODE.secboot.fd,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_ovmf_code,driver=raw,read-only=on,file=file_ovmf_code \
    -blockdev node-name=file_ovmf_vars,driver=file,filename=/mnt/123/win10-64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \
    -blockdev node-name=drive_ovmf_vars,driver=raw,read-only=off,file=file_ovmf_vars \
    -m 4096 \
    -smp 6  \
    -vnc :10 \
    -boot order=cdn,once=c,menu=on,strict=on \
    -enable-kvm \
    -qmp tcp:0:3333,server,nowait \
    -monitor stdio  \
    -rtc base=localtime,clock=host,driftfix=slew  

2.  send qmp command: {"execute": "human-monitor-command", "arguments": {"command-line": "dump-guest-memory -w /var/tmp/Memory.dmp"}, "id": "MofT1uZU"}

Actual result:
{"timestamp": {"seconds": 1638355510, "microseconds": 262315}, "event": "STOP"}
{"timestamp": {"seconds": 1638355510, "microseconds": 262810}, "event": "DUMP_COMPLETED", "data": {"result": {"total": 4294770688, "status": "failed", "completed": 0}, "error": "win-dump: failed to read CPU #4 ContextFrame location"}}
{"timestamp": {"seconds": 1638355510, "microseconds": 262856}, "event": "RESUME"}
{"return": "Error: win-dump: failed to read CPU #4 ContextFrame location\r\n", "id": "MofT1uZU"}


Additional info:
1. change '-smp 6' to  '-smp 4', not hit the issue 
2. not hit the issue on win2016 with the same qemu command.

Comment 18 Viktor Prutyanov 2022-02-09 13:49:31 UTC
I've faced similar issue with Windows 10 (10.0.19619) and QEMU 6.2.50:

(qemu) dump-guest-memory -w 1.dmp
Error: win-dump: failed to read CPU #2 ContextFrame location

So, QEMU found CPU #0 and #1 contexts but failed to find #2.

/usr/local/bin/qemu-system-x86_64 \
    -name guest=win10,debug-threads=on \
    -machine pc-q35-6.1,accel=kvm,usb=off,vmport=off,dump-guest-core=off,memory-backend=pc.ram \
    -cpu Icelake-Server,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,avx512ifma=on,sha-ni=on,rdpid=on,fsrm=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,hle=off,rtm=off,clwb=off,intel-pt=off,la57=off,wbnoinvd=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff \
    -m 6144 \
    -object memory-backend-memfd,id=pc.ram,share=yes,size=6442450944 \
    -overcommit mem-lock=off \
    -smp 4 \
    -monitor stdio \
    -rtc base=localtime,driftfix=slew \
    -no-hpet \
    -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
    -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
    -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
    -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
    -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
    -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
    -drive file=/home/vp/vms/Win10x64_2004_19619.qcow2,format=qcow2 \
    -usb \
    -usbdevice tablet \
    -spice port=5905,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \
    -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x3 \
    -nic user \
    -device vmcoreinfo

QEMU is configured with -smp 4, but Windows discovers only 2 sockets with 1 core and 1 thread:

Item      Value
Processor Intel Xeon Processor (Icelake), 3600 Mhz, 1 Core(s), 1 Logical Processor(s)
Processor Intel Xeon Processor (Icelake), 3600 Mhz, 1 Core(s), 1 Logical Processor(s)

Also, Windows reports NumberProcessors = 2 through crash dump header.
The other 2 CPUs are not used by the system for some reason, so there are no context frames for them.

Comment 19 Yvugenfi@redhat.com 2022-02-10 09:50:43 UTC
Based on the comment #18, The issue needs more investigation.

Comment 20 Viktor Prutyanov 2022-10-17 10:48:34 UTC
Number of CPU sockets is very limited in desktop versions of Windows:
https://codeinsecurity.wordpress.com/2022/04/07/cpu-socket-and-core-count-limits-in-windows-10-and-how-to-remove-them/
For example, only 2 sockets are available on Windows 10 Pro.
So, if such guest Windows is running on a system with 4 sockets, only 2 sockets are actually utilized by OS (this is actually what we are observing).

Besides of that, behavior of '-smp X' option was changed between QEMU 6.1 and 6.2:
6.1: prefer sockets over cores, '-smp X' means X sockets, 1 core
6.2: prefer cores over sockets, '-smp X' means 1 socket, X cores
The same rules are applicable for machine types pc-q35-6.1 and pc-q35-6.2. In QEMU 7.2.50 described logic is in hw/core/machine-smp.c.

To sum up, desktop Windows on QEMU with machine type <= 6.1 and number of CPUs described as '-smp X' (without details) may not fit into the socket limit.
In case of such a discrepancy, 'dump-guest-memory -w' fails because it assumes that QEMU number of CPUs are the same as number of CPUs utilized by Windows.

So, I'm preparing a patch to limit number of CPUs processed by 'dump-guest-memory -w' by number of CPUs taken from guest Windows.

Comment 22 Meirav Dean 2022-10-29 22:28:28 UTC
My NeedInfo has been removed since the TestBlocker that triggered it is no longer relevant.

Comment 25 Yanan Fu 2022-12-20 09:18:09 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 29 leidwang@redhat.com 2022-12-28 01:09:55 UTC
Tested with qemu-kvm-7.2.0-1.el9,command "dump-guest-memory -w /home/a.dmp" execute success, and Memory dump file can be saved normally.

Move this bz to VERIFIED.
Thanks!

Comment 33 errata-xmlrpc 2023-05-09 07:19:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2162