Bug 1848878
| Summary: | Win2019 guest BSOD if set cpu cores a big value | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Yumei Huang <yuhuang> |
| Component: | qemu-kvm | Assignee: | Akihiko Odaki <aodaki> |
| qemu-kvm sub component: | Devices | QA Contact: | liunana <nanliu> |
| Status: | CLOSED MIGRATED | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aodaki, chayang, juzhang, lijin, nanliu, phou, qizhu, virt-maint, yvugenfi |
| Version: | unspecified | Keywords: | MigratedToJIRA, Reopened, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-09-19 13:01:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yumei Huang
2020-06-19 06:59:48 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage Can you attach the memory.dmp or the minidump file? Or the windbg -analyze output? Thanks. It seems that Windows 2019 is limited to 64 sockets. Can you reproduce this bug if you limit the number of sockets to 64? Thanks again. (In reply to Gal Hammer from comment #4) > It seems that Windows 2019 is limited to 64 sockets. Can you reproduce this > bug if you limit the number of sockets to 64? Thanks again. The sockets is 1 by default in my test. If set sockets to 64, the max cores is 6 (=384/64), which doesn't meet the reproducible condition. BTW, I was able to reproduce with " -smp 384,sockets=2,cores=192". (In reply to Gal Hammer from comment #3) > Can you attach the memory.dmp or the minidump file? Or the windbg -analyze > output? Thanks. Seems no dump file generated. I double checked the Startup and Recovery settings, there is no dump file whatever 'small memory dump' or 'Kernel memory dump' is selected. I can only get the stop code from the blue screen, "HAL MEMORY ALLOCATION". (In reply to Yumei Huang from comment #0) > > Additional info: > 1. Hit same issue on > 8.2.1av(qemu-kvm-4.2.0-27.module+el8.2.1+7092+9d345e72), but it works > before(qemu-kvm-core-4.2.0-20.module+el8.2.1+6467+49dc3278.x86_64). Not sure > if it's a qemu regression or guest os issue. I didn't record the win2019 guest os build when tested with qemu-kvm-core-4.2.0-20.module+el8.2.1+6467+49dc3278.x86_64, but it's before build 17763.1217 comes out. I just had a try with the build 17763.1217 with same qemu version, reproduced the issue. So it might be triggered by guest os change. Hit same issue with win2016 guest(os build: 14393.1884). (In reply to Yumei Huang from comment #9) > Hit same issue with win2016 guest(os build: 14393.1884). Host: Model name: AMD EPYC 7251 8-Core Processor qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420 kernel-4.18.0-222.el8.x86_64 QEMU cli: # /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,kernel-irqchip=split \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 8192 \ -device intel-iommu,intremap=on,eim=on \ -smp 384,maxcpus=384,cores=96,threads=2,dies=1,sockets=2 \ -cpu 'EPYC',hv_stimer,hv_synic,hv_vpindex,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_frequencies,hv_runtime,hv_tlbflush,hv_reenlightenment,hv_stimer_direct,hv_ipi,+kvm_pv_unhalt \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/win2016-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:cb:08:54:2b:a2,id=idR4WKNK,netdev=idR9yupJ,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idR9yupJ,vhost=on \ -blockdev node-name=file_cd1,driver=file,read-only=on,aio=threads,filename=/home/kvm_autotest_root/iso/windows/winutils.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \ -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm -monitor stdio -qmp tcp:0:4444,server,nowait Can you please confirm that it works with qemu-kvm-core-4.2.0-20.module+el8.2.1+6467+49dc3278.x86_64 (as mentioned in comment #0)? Thanks. (In reply to Gal Hammer from comment #11) > Can you please confirm that it works with > qemu-kvm-core-4.2.0-20.module+el8.2.1+6467+49dc3278.x86_64 (as mentioned in > comment #0)? Thanks. The latest win2019 OS has the same issue with qemu-kvm-core-4.2.0-20.module+el8.2.1+6467+49dc3278.x86_64, pls refer to comment 8. Thanks. Logging some findings: * #cores <= 128 seem to work, so it is possible to use "-smp 240,cpus=2,cores=120,maxcpus=240,sockets=2". * Adding "pmu" feature to the -cpu command line parameter allow exceeding the 128 cores limit (e.g. "-cpu qemu64,pmu -smp 1,cpus=1,cores=240,maxcpus=240,sockets=1"). Adding the pmu feature make QEMU report that the CPU supports "Architectural Performance Monitoring" (CPUID 0x0A function). However, I couldn't find a direct relationship between the performance counters and the number of cores. Microsoft's documentation about Windows 2019 hardware requirements (https://docs.microsoft.com/en-us/windows-server/get-started-19/sys-reqs-19) doesn't include any reference to the performance counters. Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Hi, Could you help to check the Comment 21? Seems we need to re-open this bug. Do you have any other comments? Thanks. Best regards Liu Nana (In reply to liunana from comment #23) > Hi, > > > Could you help to check the Comment 21? > Seems we need to re-open this bug. Do you have any other comments? > > Thanks. > > > > Best regards > Liu Nana According to QE it is not a regression, because they test with CPU amount below 240. On other hand, there are systems with more and more CPU core, and Windows Server 2019 don't have 240 cores limit - so in my opinion we should investigate this issue some more. (In reply to Yvugenfi from comment #24) > (In reply to liunana from comment #23) > > Hi, > > > > > > Could you help to check the Comment 21? > > Seems we need to re-open this bug. Do you have any other comments? > > > > Thanks. > > > > > > > > Best regards > > Liu Nana > > According to QE it is not a regression, because they test with CPU amount > below 240. > On other hand, there are systems with more and more CPU core, and Windows > Server 2019 don't have 240 cores limit - so in my opinion we should > investigate this issue some more. Thank you, re-open it now. Hit the similar issues on RHEL9 host w/ 96/128 vcpus, Tested with the latest openstack 17.0:
1) Hit BSOD when boot a instance with >=96 vcpus(tested w/ 96 vcpus and 128 vcpus) for win2019, bios mode is seabios, stop code: IRQL_NOT_LESS_OR_EQUAL
2) Tested for Win2022, with 96 vcpus, instance booted successfully; but with 128vcpus, also hit BSOD, bios mode is seabios, Stop code: SYSTEM THERAD EXCEPTION NOT HANDLED
Used versions:
kernel-5.14.0-70.22.1.el9_0.x86_64
qemu-img-6.2.0-11.el9_0.3.x86_64
seabios-bin-1.15.0-1.el9.noarch
virtio-win-1.9.28-0.el9_0.iso
RHOS-17.0-RHEL-9-20220811.n.0/
libvirt-libs-8.0.0-8.1.el9_0.x86_64
Qemu commands(openstack instance used defaults):
/usr/libexec/qemu-kvm -name guest=instance-00000020,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-11-instance-00000020/master-key.aes"} -machine pc-q35-rhel9.0.0,usb=off,dump-guest-core=off,memory-backend=pc.ram -accel kvm -cpu Skylake-Server-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rsba=on,skip-l1dfl-vmentry=on,pschange-mc-no=on -m 5939200 -object {"qom-type":"memory-backend-ram","id":"pc.ram","size":6227702579200} -overcommit mem-lock=off -smp 128,sockets=128,dies=1,cores=1,threads=1 -uuid 42c491e2-99f5-4632-9ce3-bf8fc7d86539 -smbios type=1,manufacturer=Red Hat,product=OpenStack Compute,version=23.2.2-0.20220720130412.7074ac0.el9ost,serial=42c491e2-99f5-4632-9ce3-bf8fc7d86539,uuid=42c491e2-99f5-4632-9ce3-bf8fc7d86539,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot menu=on,strict=on -device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 -device pcie-root-port,port=24,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3 -device pcie-root-port,port=25,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1 -device pcie-root-port,port=26,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2 -device pcie-root-port,port=27,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3 -device pcie-root-port,port=28,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4 -device pcie-root-port,port=29,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5 -device pcie-root-port,port=30,chassis=15,id=pci.15,bus=pcie.0,addr=0x3.0x6 -device pcie-root-port,port=31,chassis=16,id=pci.16,bus=pcie.0,addr=0x3.0x7 -device pcie-root-port,port=32,chassis=17,id=pci.17,bus=pcie.0,addr=0x4 -device pcie-pci-bridge,id=pci.18,bus=pci.1,addr=0x0 -device piix3-usb-uhci,id=usb,bus=pci.18,addr=0x1 -blockdev {"driver":"file","filename":"/var/lib/nova/instances/_base/3276679d1ebe7d0e77074f8a9c21cc798bb3f8c2","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"} -blockdev {"driver":"file","filename":"/var/lib/nova/instances/42c491e2-99f5-4632-9ce3-bf8fc7d86539/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"} -device virtio-blk-pci,bus=pci.3,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=38 -device virtio-net-pci,rx_queue_size=512,host_mtu=1442,netdev=hostnet0,id=net0,mac=fa:16:3e:06:9a:29,bus=pci.2,addr=0x0 -add-fd set=3,fd=34 -chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -audiodev {"id":"audio1","driver":"none"} -vnc 172.16.2.104:2,audiodev=audio1 -device virtio-vga,id=video0,max_outputs=1,bus=pcie.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.4,addr=0x0 -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.5,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
Thanks~
Peixiu
Oops - I see I set the wrong SST - I meant to change to Windows... In any case, is it possible this is related to similar q35 issues related to large number of vCPUS (bug 2091166, bug 1942820, bug 1906077) or even tseg size (bug 1866110)? (In reply to John Ferlan from comment #30) > Oops - I see I set the wrong SST - I meant to change to Windows... > > In any case, is it possible this is related to similar q35 issues related to > large number of vCPUS (bug 2091166, bug 1942820, bug 1906077) or even tseg > size (bug 1866110)? Yes, can be related to BIOS issues. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. After some investigation, I concluded that Windows 2019 is incompatible with a topology where a socket owns more than 128 cores. This problem does not occur with Windows 11 22H2. It is not appropriate to fix in QEMU. Perhaps it is better to describe the acceptable maximum number of cores in osinfo-db so that virt-install can configure the CPU topology properly. However, the schema of osinfo-db currently does not contain a field for this. There is the "n-cpus" field, but it does not describe the topology. Such a fix can be applied for new installation. There are two options to workaround this issue or to fix an existing installation: 1. Use the latest Windows. 2. Limit the number of cores to 128. Ideally, it should also be a power of two (this also applies to the latest Windows). The below is the detailed explanation of the problem: The cause is the combination of two facts regarding the CoresPerPhysicalProcessor member of nt!_kprcb structure: a. Windows rounds the number of cores up so that it will be a power of two. b. The member is only 8-bit wide. Due to fact a, if 128 < the number of cores < 256, it rounds up into 256. Due to fact b, the value 256 won't fit into the CoresPerPhysicalProcessor member, and it wraps into 0. The wrong value of the CoresPerPhysicalProcessor member confuses hal!HalpQueryMaximumRegisteredProcessorCount function. hal!HalpQueryMaximumRegisteredProcessorCount returns the number of cores in the system usable in accordance of both of the physical resource limit and the software license. It algorithm is as follows: i. Determine the number of cores in the system. ii. Derive: CoresPerPhysicalProcessor * LogicalProcessorsPerCore * (The maximum number of sockets the license allows) iii. Compare i and ii, and return the smaller one. The instruction to load CoresPerPhysicalProcessor in hal!HalpQueryMaximumRegisteredProcessorCount is at offset 0x12184. If the number of cores > 128, this function will always return 0 since CoresPerPhysicalProcessor is 0. hal!HalpPteReserveResources function calls hal!HalpQueryMaximumRegisteredProcessorCount (the offset of the call instruction in the function is 0xf) and uses this value to derive the memory size to allocate. 0 is not valid as memory size and leads to HAL_MEMORY_ALLOCATION. The CoresPerPhysicalProcessor member is 32-bit on Windows 11 22H2 so this problem does not occur. It is unclear why CoresPerPhysicalProcessor is rounded. Perhaps it is intended and I see no problem running Windows 11 22H2 with 1 socket/240 cores, but it may be better to ensure the number of cores will be always a power of two. This rounding still happens on Windows 11 22H2. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |