Bug 1904267
Description
Eduardo Habkost
2020-12-03 23:42:45 UTC
Work in progress at: https://gitlab.com/ehabkost/qemu/-/commits/work/smbios-configuration Since you seem to already be working on it, I'll assign to you. I set prio as medium, but feel free to adjust. Hello Eduardo, Could you tell me if this bug needs big machine with 720 cpus to test? (In reply to Yanhui Ma from comment #3) > Hello Eduardo, > > Could you tell me if this bug needs big machine with 720 cpus to test? It doesn't. It can be tested by just starting a small VM with SMBIOS 3.0 entry point configured, and then checking output of `dmidecode` in the guest. Hello Eduardo, We finally created a working instance using your git clone for qemu on an 8TB, 640 cpus, 32 socket machine. After many iterations we found the following. 500GB with 416 vcpus boots. 7TB also boots but takes a long time. 500GB with 448 vcpus fails with the following in /tmp/debugcon.txt Loading SMM driver at 0x0007F0D7000 EntryPoint=0x0007F0DF99D VariableSmm.efi mSmmMemLibInternalMaximumSupportAddress = 0x1FFFFFFFFFF VarCheckLibRegisterSetVariableCheckHandler - 0x7F0DE7AE Success Variable driver common space: 0x3FF9C 0x3FF9C 0x3FF9C Variable driver will work with auth variable format! ASSERT_EFI_ERROR (Status = Out of Resources) ASSERT /builddir/build/BUILD/edk2-ca407c7246bf/MdeModulePkg/Universal/Variable/RuntimeDxe/VariableSmm.c(1111): !EFI_ERROR (Status) 500GB with 512 vcpus fails with the following in /tmp/debugcon.txt CPU[1FC] APIC ID=01FC SMBASE=7FFAB000 SaveState=7FFBAC00 Size=00000400 CPU[1FD] APIC ID=01FD SMBASE=7FFAD000 SaveState=7FFBCC00 Size=00000400 CPU[1FE] APIC ID=01FE SMBASE=7FFAF000 SaveState=7FFBEC00 Size=00000400 CPU[1FF] APIC ID=01FF SMBASE=7FFB1000 SaveState=7FFC0C00 Size=00000400 ASSERT /builddir/build/BUILD/edk2-ca407c7246bf/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c(894): Stacks != ((void *) 0) We will attach more information Any comments? Have a good day. Regards, Brian Created attachment 1793023 [details] virt-install command for Comment 5 Created attachment 1793024 [details] xml for Comment 5 Created attachment 1793025 [details] debug output for success in Comment 5 Created attachment 1793026 [details] debug output for 448 vcpu failure in Comment 5 Created attachment 1793027 [details] debug output for 512 vcpu failure in Comment 5 (In reply to Brian Payton from comment #5) > Hello Eduardo, > > We finally created a working instance using your git clone for qemu on > an 8TB, 640 cpus, 32 socket machine. After many iterations we found the > following. > > 500GB with 416 vcpus boots. 7TB also boots but takes a long time. > > > 500GB with 448 vcpus fails with the following in /tmp/debugcon.txt > > Loading SMM driver at 0x0007F0D7000 EntryPoint=0x0007F0DF99D VariableSmm.efi > mSmmMemLibInternalMaximumSupportAddress = 0x1FFFFFFFFFF > VarCheckLibRegisterSetVariableCheckHandler - 0x7F0DE7AE Success > Variable driver common space: 0x3FF9C 0x3FF9C 0x3FF9C > Variable driver will work with auth variable format! > > ASSERT_EFI_ERROR (Status = Out of Resources) > ASSERT > /builddir/build/BUILD/edk2-ca407c7246bf/MdeModulePkg/Universal/Variable/ > RuntimeDxe/VariableSmm.c(1111): !EFI_ERROR (Status) You are probably out of memory due to too small TSEG size. See bug 1469338 and bug 1866110. I suggest testing using `-global mch.extended-tseg-mbytes=64`. > > 500GB with 512 vcpus fails with the following in /tmp/debugcon.txt > > CPU[1FC] APIC ID=01FC SMBASE=7FFAB000 SaveState=7FFBAC00 Size=00000400 > CPU[1FD] APIC ID=01FD SMBASE=7FFAD000 SaveState=7FFBCC00 Size=00000400 > CPU[1FE] APIC ID=01FE SMBASE=7FFAF000 SaveState=7FFBEC00 Size=00000400 > CPU[1FF] APIC ID=01FF SMBASE=7FFB1000 SaveState=7FFC0C00 Size=00000400 > ASSERT > /builddir/build/BUILD/edk2-ca407c7246bf/UefiCpuPkg/PiSmmCpuDxeSmm/ > PiSmmCpuDxeSmm.c(894): Stacks != ((void *) 0) This is just the lack of memory in a different location. For reference, the code triggering the assert is: Stacks = (UINT8 *) AllocatePages (gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus * (EFI_SIZE_TO_PAGES (mSmmStackSize + mSmmShadowStackSize))); ASSERT (Stacks != NULL); Thanks Eduardo. I will update the qemu:commandline entry in the xml with the following and try again. <qemu:commandline> <qemu:arg value='-machine'/> <qemu:arg value='smbios-ep=3_0'/> <qemu:arg value='-global'/> <qemu:arg value='mch.extended-tseg-mbytes=64'/> <qemu:arg value='-chardev'/> <qemu:arg value='file,path=/tmp/debugcon.txt,id=debugcon'/> <qemu:arg value='-device'/> <qemu:arg value='isa-debugcon,iobase=0x402,chardev=debugcon'/> </qemu:commandline> Regards, Brian Hello Eduardo, Thank you for your help. With the following settings, a 7.5TB 640 vcpu virtual machine is running on our 8TB 640 cpu 32 socket server. We have an improved working example for numa settings and performance testing on this system and on the 12TB and 24TB systems when we can reserve them. <memory unit='KiB'>8053063680</memory> <currentMemory unit='KiB'>8053063680</currentMemory> <vcpu placement='static'>640</vcpu> <os> <type arch='x86_64' machine='pc-q35-6.0'>hvm</type> <loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader> <nvram>/var/lib/libvirt/qemu/nvram/demo_uefi_VARS.fd</nvram> <boot dev='hd'/> <bootmenu enable='yes'/> </os> <features> <acpi/> <apic/> <pae/> <smm state='on'> <tseg unit='MiB'>64</tseg> </smm> Please let me know if you want any changes or preferred testing. Any comments? Have a great day. Regards, Brian (In reply to Brian Payton from comment #13) > Please let me know if you want any changes or preferred testing. I don't have any change suggestion by now. My only request is that the full domain XML and qemu.log file for the latest working configuration be attached to the BZ, so we have a record of a known working config for future reference. Created attachment 1793238 [details] working xml file for comment #14 This works with the vcpus set to 768, more than the 640 the system has. Created attachment 1793239 [details] Log file for Comment 14 For further reference: On the host: [root@fsg-uv2k-3 qemu]# free -g total used free shared buff/cache available Mem: 7809 180 7624 0 3 7609 Swap: 11 0 11 [root@fsg-uv2k-3 qemu]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 640 On-line CPU(s) list: 0-639 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 32 NUMA node(s): 32 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E5-4650 v2 @ 2.40GHz BIOS Model name: Intel(R) Xeon(R) CPU E5-4650 v2 @ 2.40GHz Stepping: 4 CPU MHz: 2864.253 CPU max MHz: 2900.0000 CPU min MHz: 1200.0000 BogoMIPS: 4799.96 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-9,320-329 NUMA node1 CPU(s): 10-19,330-339 NUMA node2 CPU(s): 20-29,340-349 NUMA node3 CPU(s): 30-39,350-359 NUMA node4 CPU(s): 40-49,360-369 NUMA node5 CPU(s): 50-59,370-379 NUMA node6 CPU(s): 60-69,380-389 NUMA node7 CPU(s): 70-79,390-399 NUMA node8 CPU(s): 80-89,400-409 NUMA node9 CPU(s): 90-99,410-419 NUMA node10 CPU(s): 100-109,420-429 NUMA node11 CPU(s): 110-119,430-439 NUMA node12 CPU(s): 120-129,440-449 NUMA node13 CPU(s): 130-139,450-459 NUMA node14 CPU(s): 140-149,460-469 NUMA node15 CPU(s): 150-159,470-479 NUMA node16 CPU(s): 160-169,480-489 NUMA node17 CPU(s): 170-179,490-499 NUMA node18 CPU(s): 180-189,500-509 NUMA node19 CPU(s): 190-199,510-519 NUMA node20 CPU(s): 200-209,520-529 NUMA node21 CPU(s): 210-219,530-539 NUMA node22 CPU(s): 220-229,540-549 NUMA node23 CPU(s): 230-239,550-559 NUMA node24 CPU(s): 240-249,560-569 NUMA node25 CPU(s): 250-259,570-579 NUMA node26 CPU(s): 260-269,580-589 NUMA node27 CPU(s): 270-279,590-599 NUMA node28 CPU(s): 280-289,600-609 NUMA node29 CPU(s): 290-299,610-619 NUMA node30 CPU(s): 300-309,620-629 NUMA node31 CPU(s): 310-319,630-639 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm cpuid_fault epb pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d In the virtual machine: [root@fedora ~]# free -g total used free shared buff/cache available Mem: 7558 4 7553 0 0 7536 Swap: 7 0 7 [root@fedora ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 768 On-line CPU(s) list: 0-767 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 768 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E5-4650 v2 @ 2.40GHz Stepping: 4 CPU MHz: 2399.980 BogoMIPS: 4799.96 Hypervisor vendor: KVM Virtualization type: full L1d cache: 24 MiB L1i cache: 24 MiB L2 cache: 3 GiB L3 cache: 12 GiB NUMA node0 CPU(s): 0-767 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xs ave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep erms xsaveopt arat umip md_clear arch_capabilities Hello Eduardo, We moved the testing to a larger system. Currently 192 vcpus, 32 sockets. and 8TB of memory works. 10TB and above fail with the following in /tmp/debugcon.txt, using your latest qemu git clone. I am probing the vcpu limit now and will gather more information for this bug after both limits are found. I Just thought you might be interested in the current message. SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table SmbiosCreateTable() re-allocate SMBIOS 32-bit table SmbiosCreate64BitTable() re-allocate SMBIOS 64-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table ASSERT_EFI_ERROR (Status = Already started) ASSERT /builddir/build/BUILD/edk2-ca407c7246bf/OvmfPkg/SmbiosPlatformDxe/SmbiosPlatformDxe.c(125): !EFI_ERROR (Status) From the vm log file: -global mch.extended-tseg-mbytes=128 \ -cpu host,migratable=on \ -global driver=cfi.pflash01,property=secure,value=on \ -m 10485760 \ Regards, Brian Hello Eduardo, I have 5 examples below to enter as 5 separate comments and their associated files. The 9TB vm failures were different between the simple and numa configurations. I will verify or capture any difference between 1024 vcpus simple and numa configured. Simple memory and vcpus configuration - Working with 8TB and 960 vcpus - Failure with 9TB and 960 vcpus - Failure with 8TB and 1024 vcpus numa configured memory and vcpus - Working with 8TB and 960 vcpus - Failure with 9TB and 960 vcpus Have a good day. Regards, Brian Working simple configuration Created attachment 1795927 [details] Comment 20 xml Created attachment 1795928 [details] Comment 20 log Created attachment 1795929 [details] Comment 20 debugcon.txt Created attachment 1795930 [details] Comment 20 inside the running vm Working 8TB 960 vcpu configured with across 32 numa cells. Created attachment 1795931 [details] Comment 25 xml Created attachment 1795932 [details] Comment 25 log Created attachment 1795933 [details] Comment 25 debugcon.txt Created attachment 1795934 [details] Comment 25 inside the running vm Simple failure with 1024 vcpus and 8TB Created attachment 1795935 [details] Comment 30 xml Created attachment 1795936 [details] Comment 30 log Created attachment 1795937 [details] Comment 30 debugcon.txt Simple failure with 9TB and 960 vcpus. Created attachment 1795939 [details] Comment 34 xml Created attachment 1795940 [details] Comment 34 log Created attachment 1795941 [details] Comment 34 command line output 9TB 960 vcpu 32 socket numa configuration Created attachment 1795943 [details] Comment 38 xml Created attachment 1795945 [details] Comment 38 log Created attachment 1795946 [details] Comment 38 debugcon.txt This is the new failure with 8TB 1024 vcpus and 32 sockets numa configured. Created attachment 1795949 [details] Comment 42 xml Created attachment 1795951 [details] Comment 42 log Created attachment 1795952 [details] Comment 42 debugcon.txt (In reply to Brian Payton from comment #30) > Simple failure with 1024 vcpus and 8TB 1024vcpu.debugcon shows 64 MB TSEG, and a "Out of Resources" failure. Probably TSEG size needs to be even larger for this configuration. (In reply to Brian Payton from comment #38) > 9TB 960 vcpu 32 socket numa configuration This one has a different error (ASSERT_EFI_ERROR (Status = Already started)) but it also has a 64 MB TSEG. (In reply to Brian Payton from comment #42) > This is the new failure with 8TB 1024 vcpus and 32 sockets numa configured. This is also a TSEG size issue (ASSERT_EFI_ERROR (Status = Out of Resources)). Is a debugcon log + QEMU command line available for the 128 MB TSEG failure mentioned in comment #18? I want to make sure it's really not a TSEG size issue. Hello Eduardo, I will rerun Comments 30, 38, and 42 with TSEG set to 128. Would you also like to see a 256 setting? Regards, Brian (In reply to Brian Payton from comment #47) > Hello Eduardo, > > I will rerun Comments 30, 38, and 42 with TSEG set to 128. Would you > also like to see a 256 setting? If it still fails with 128 MB, yes please! We donĀ“t know yet what are the required TSEG sizes for each of those scenarios because we never tested them. (In reply to Brian Payton from comment #18) > ASSERT_EFI_ERROR (Status = Already started) > ASSERT > /builddir/build/BUILD/edk2-ca407c7246bf/OvmfPkg/SmbiosPlatformDxe/ > SmbiosPlatformDxe.c(125): !EFI_ERROR (Status) This one might be an unexpected duplicate SMBIOS handle in the tables generated by QEMU. I will double-check the SMBIOS table generation code to be sure. Hello Eduardo, I repeated the Comment #42 test with tseg set to 128, 256, and 1GiB and still received the same results. The second PhysicalSize entry towards the end changed with the increased tseg value but still failed with the Out of Resources issue. Since the simpler test in Comment #30 shows similar symptoms for you, I will do some isolation there since it is easier to change the vcpu count. Can I gather more information for you or increase the debug level? Be back before long hopefully. Regards, Brian (In reply to Eduardo Habkost from comment #49) > (In reply to Brian Payton from comment #18) > > ASSERT_EFI_ERROR (Status = Already started) > > ASSERT > > /builddir/build/BUILD/edk2-ca407c7246bf/OvmfPkg/SmbiosPlatformDxe/ > > SmbiosPlatformDxe.c(125): !EFI_ERROR (Status) > > This one might be an unexpected duplicate SMBIOS handle in the tables > generated by QEMU. I will double-check the SMBIOS table generation code to > be sure. Yes, this is where the problem comes from: static void smbios_build_type_17_table(unsigned instance, uint64_t size) { SMBIOS_BUILD_TABLE_PRE(17, 0x1100 + instance, true); /* required */ ... static void smbios_build_type_19_table(unsigned instance, uint64_t start, uint64_t size) { SMBIOS_BUILD_TABLE_PRE(19, 0x1300 + instance, true); /* required */ Using the current handle assignment code, we can have only up to 512 DIMM slots before their type 17 SMBIOS handles conflict with the type 19 table handles. QEMU has DIMM sizes hardcoded to 16GB, which means an 8TB guest will hit the limit. I will open a separate BZ for that specific issue. The remaining cases (<= 8 TB VMs) seem to be due to small TSEG size. (In reply to Brian Payton from comment #50) > Hello Eduardo, > > I repeated the Comment #42 test with tseg set to 128, 256, and 1GiB > and still received the same results. The second PhysicalSize entry towards > the end changed with the increased tseg value but still failed with the Out > of Resources issue. Since the simpler test in Comment #30 shows similar > symptoms for you, I will do some isolation there since it is easier to > change the vcpu count. > > Can I gather more information for you or increase the debug level? Can you please attach the debugcon output with 128MB TSEG (all remaining configuration being exactly the same as comment #30), so we can compare with comment #30? Comparing both log files might help us identify what's wrong. Laszlo, any advice on what could help us debug the issue? Created attachment 1796021 [details] Comment 52 debugcon.txt file, Comment #30 with tseg=128 (In reply to Brian Payton from comment #30) > Simple failure with 1024 vcpus and 8TB For reference, this is the failure on debugcon: [...] GetMicrocodePatchInfoFromHob: Microcode patch cache HOB is not found. CpuMpPei: 5-Level Paging = 0 Register PPI Notify: 8F9D4825-797D-48FC-8471-845025792EF6 ASSERT_EFI_ERROR (Status = Out of Resources) ASSERT /builddir/build/BUILD/edk2-ca407c7246bf/UefiCpuPkg/CpuMpPei/CpuBist.c(186): !EFI_ERROR (Status) And this is the failing code: BistInformationSize = sizeof (EFI_SEC_PLATFORM_INFORMATION_RECORD2) + sizeof (EFI_SEC_PLATFORM_INFORMATION_CPU) * NumberOfProcessors; Status = PeiServicesAllocatePool ( (UINTN) BistInformationSize, (VOID **) &PlatformInformationRecord2 ); ASSERT_EFI_ERROR (Status); Maybe we're hitting some limit on allocation sizes? I don't know what's the size of EFI_SEC_PLATFORM_INFORMATION_RECORD2 and EFI_SEC_PLATFORM_INFORMATION_CPU. If making edk2 support more than 1024 VCPUs will require extra work, we probably should keep working on both SeaBIOS and OVMF support for larger VMs, just in case we find out that making SeaBIOS work will be easier than OVMF. Hi Eduardo, Thanks for the information. Frank cloned your git tree so we have pull and rebuild any changes you want to try. We have the 24TB, 1792 cpu, 32 socket system a couple more days for testing. Regards, Brian Hello Eduardo, To clarify the process, for OVMF testing now, I create each new vm instance from scratch using the Fedora iso dvd. Once the Fedora is installed, I can edit the xml file for larger instances and features since my systems only include /usr/share/OVMF/OVMF_CODE.secboot.fd. The OVMF_CODE.fd file does not exist. For future testing should we keep all things equal except the original virt-install command line option below? This should switch between SeaBIOS and OVMF, and easily documented. --boot menu=on,uefi,loader.secure='no' \ Any comments? Have a good day. Regards, Brian (In reply to Eduardo Habkost from comment #54) > (In reply to Brian Payton from comment #30) > > Simple failure with 1024 vcpus and 8TB > > For reference, this is the failure on debugcon: > > [...] > GetMicrocodePatchInfoFromHob: Microcode patch cache HOB is not found. > CpuMpPei: 5-Level Paging = 0 > Register PPI Notify: 8F9D4825-797D-48FC-8471-845025792EF6 > > ASSERT_EFI_ERROR (Status = Out of Resources) > ASSERT > /builddir/build/BUILD/edk2-ca407c7246bf/UefiCpuPkg/CpuMpPei/CpuBist.c(186): > !EFI_ERROR (Status) > > > > And this is the failing code: > > BistInformationSize = sizeof (EFI_SEC_PLATFORM_INFORMATION_RECORD2) + > sizeof (EFI_SEC_PLATFORM_INFORMATION_CPU) * > NumberOfProcessors; > Status = PeiServicesAllocatePool ( > (UINTN) BistInformationSize, > (VOID **) &PlatformInformationRecord2 > ); > ASSERT_EFI_ERROR (Status); > > Maybe we're hitting some limit on allocation sizes? I don't know what's the > size of EFI_SEC_PLATFORM_INFORMATION_RECORD2 and > EFI_SEC_PLATFORM_INFORMATION_CPU. This is an edk2 design limitation. Please file an edk2 bug for RHEL-8, and clone it for RHEL-9. Meanwhile I've sent an upstream problem report: * [edk2-devel] CPU count limitation in CpuMpPei BIST processing https://listman.redhat.com/archives/edk2-devel-archive/2021-June/msg01493.html http://mid.mail-archive.com/ffa9d7db-b670-8b88-758f-4785c8d05d40@redhat.com https://edk2.groups.io/g/devel/message/77376 Thanks Laszlo (In reply to Brian Payton from comment #56) > To clarify the process, for OVMF testing now, I create each new vm > instance from scratch using the Fedora iso dvd. Once the Fedora is > installed, I can edit the xml file for larger instances and features > since my systems only include /usr/share/OVMF/OVMF_CODE.secboot.fd. > The OVMF_CODE.fd file does not exist. You don't need "OVMF_CODE.fd". The "secboot" in "OVMF_CODE.secboot.fd" only means that the Secure Boot *firmware feature* is included in the firmware binary. It does not imply that the Secure Boot *operating mode* is enabled as soon as the domain is defined and first launched. The Secure Boot *operating mode* depends on the variable store template file from which the newly defined domain's private variable store is instantiated. In RHEL, two varstore templates are provided, "OVMF_VARS.secboot.fd" and "OVMF_VARS.fd". By default, the former is used; that's why you get new domains with the Secure Boot operating mode enabled. The simplest solution for permanently masking "OVMF_VARS.secboot.fd" on the host is the following: # umask 0022 # mkdir -p /etc/qemu/firmware # touch /etc/qemu/firmware/40-edk2-ovmf-sb.json # restorecon -FvvR /etc/qemu/firmware Then just pass "--boot uefi" to virt-install. (Technically, this masks the firmware descriptor file "/usr/share/qemu/firmware/40-edk2-ovmf-sb.json", and so "/usr/share/qemu/firmware/50-edk2-ovmf.json" will take effect. (Those files are very easy to read for humans too, so please feel free to consult them, or even diff them between each other.)) More flexible firmware use case selection, on a domain-by-domain basis, is the subject of bug 1929357 (see the docs at <https://libvirt.org/formatdomain.html#bios-bootloader>). However, I cannot say how much of that is already exposed by the virt-install utility. One thing that certainly works is the following (very verbose) syntax, which ignores the firmware descriptors (metadata files) under "/usr/share/qemu/firmware" altogether: (a) --machine q35 \ --features smm=on \ --qemu-commandline='-global isa-debugcon.iobase=0x402 -debugcon file:/tmp/DOMAIN.ovmf.log' \ --boot loader=/usr/share/OVMF/OVMF_CODE.secboot.fd,loader_ro=yes,loader_type=pflash,loader_secure=yes,nvram_template=/usr/share/OVMF/OVMF_VARS.fd \ (b) --machine q35 \ --features smm=on \ --qemu-commandline='-global isa-debugcon.iobase=0x402 -debugcon file:/tmp/DOMAIN.ovmf.log' \ --boot loader=/usr/share/OVMF/OVMF_CODE.secboot.fd,loader_ro=yes,loader_type=pflash,loader_secure=yes,nvram_template=/usr/share/OVMF/OVMF_VARS.secboot.fd \ Option (a) will give you a domain with the SB operational mode disabled, option (b) will give you one with the SB operational mode enabled. Note that these command line snippets only differ in the "nvram_template" option-argument. > For future testing should we keep all things equal except the original > virt-install command line option below? This should switch between > SeaBIOS and OVMF, and easily documented. For defining a SeaBIOS domain, simply add *none* of the "--features", "--qemu-commandline", and "--boot" options. Thanks, Laszlo Another hint -- in order to avoid huge TSEG allocations, please enable 1GB page size support for your domain. <domain ...> <cpu ...> <feature policy='require' name='pdpe1gb'/> </cpu> </domain> This should significantly decrease the SMRAM that needs to be allocated for the SMM page tables. (I can see "pdpe1gb" in the *guest* cpuinfo in comment 17, so this setting could already be in place; I'm not sure.) Hello Lazio, Thank you for all the great information. I did see pdpe1gb must come through the pass-through setting we use and I am excited to try your OVMF information for non-secure boot configurations. As we learned in the past, we are pushing the limits because we can and should for our customers. Identifying the limits and why is the primary goal of this investigation. Hopefully we can overcome them and grow our solution. If not, we know why. Have a great day. Regards, Brian Hello Lazio, Just out of curiosity, your changes are very helpful for creating vms with virt-install and eliminate a lot of manual editing of the xml file. I still have to add the following to enable more than 255vcpus on a vm. Do you know of a virt-install option to generate this xml? <devices> <iommu model='intel'> <driver intremap='on' eim='on'/> </iommu> Thank you again for your help. Have a great day. Regards, Brian Hi Brian, (In reply to Brian Payton from comment #61) > Hello Lazio, > > Just out of curiosity, your changes are very helpful for creating vms > with virt-install and eliminate a lot of manual editing of the xml file. I > still have to add the following to enable more than 255vcpus on a vm. Do > you know of a virt-install option to generate this xml? > > <devices> > <iommu model='intel'> > <driver intremap='on' eim='on'/> > </iommu> > > Thank you again for your help. Have a great day. Upstream virt-manager (virt-install) has commit 25419db9caf0 ("virtinst: add support for configuring the IOMMU", 2020-07-12), which I believe would do what you need. However, I think this commit is not part of RHEL-8, as yet. I'm adding Cole to the CC list of this RHBZ to correct me if necessary. Thanks! Created attachment 1797190 [details] SeaBIOS image with SMBIOS 3.0 support SeaBIOS image for testing attached. It can be used with the `-bios` option in the QEMU command line, or with the <loader> element as documented at https://libvirt.org/formatdomain.html#bios-bootloader Source code for the binary is available at https://gitlab.com/ehabkost/seabios/-/tree/7639f0711ba5cb4f943396cf127b0821099420d4 Hello Eduardo, I repeated the testing with the seabios virtual instance shown below. 8TB is the single memory limit as before and I was unable to repeat the numa configurations. The seabios test limit was 8TB with 960 vcpus. More vcpus generated the following in the debugcon log file and increasing the tseg size up to 1GB did not alter these results. Any comments? Regards, Brian Copying SMBIOS 3.0 from 0x00006cd9 to 0x000f5d00 WARNING - Unable to allocate resource at romfile_loader_allocate:87! WARNING - internal error detected at romfile_loader_add_checksum:152! WARNING - internal error detected at romfile_loader_add_pointer:129! WARNING - internal error detected at romfile_loader_add_pointer:129! WARNING - internal error detected at romfile_loader_add_pointer:129! WARNING - internal error detected at romfile_loader_add_checksum:152! WARNING - internal error detected at romfile_loader_add_checksum:152! WARNING - internal error detected at romfile_loader_add_checksum:152! WARNING - internal error detected at romfile_loader_add_checksum:152! WARNING - internal error detected at romfile_loader_add_checksum:152! WARNING - internal error detected at romfile_loader_add_pointer:129! WARNING - internal error detected at romfile_loader_add_pointer:129! WARNING - internal error detected at romfile_loader_add_pointer:129! WARNING - internal error detected at romfile_loader_add_pointer:129! WARNING - internal error detected at romfile_loader_add_pointer:129! WARNING - internal error detected at romfile_loader_add_checksum:152! WARNING - internal error detected at romfile_loader_add_pointer:129! virt-install \ --name demo_seabios \ --disk /tmp/demo_seabios.qcow2 \ --import \ --noreboot \ --console pty,target_type=serial \ --graphics vnc \ --video vga \ --cpu host-passthrough \ --network bridge=virbr0 \ --os-type=linux \ --os-variant=rhel8.1 \ --features apic=on,apic.eoi=on,pae=on \ --boot loader='/tmp/seabios.bin' \ --qemu-commandline='-global isa-debugcon.iobase=0x402 -debugcon file:/tmp/demo_test.ovmf.log ' \ --qemu-commandline='-machine smbios-ep=3-0' \ --qemu-commandline='-global mch.extended-tseg-mbytes=128' \ --cpu host-passthrough \ --vcpus 255 \ --memory 8390356 Output from commands inside the demo_seabios vm: [root@ah-071 ~]# ssh root.122.191 sign_and_send_pubkey: no mutual signature supported root.122.191's password: Web console: https://fedora:9090/ or https://192.168.122.191:9090/ Last login: Wed Aug 25 22:19:30 2021 from 192.168.122.1 [root@fedora ~]# free -g total used free shared buff/cache available Mem: 8064 15 8048 0 0 8030 Swap: 7 0 7 [root@fedora ~]# dmidecode | head -15 # dmidecode 3.2 Getting SMBIOS data from sysfs. SMBIOS 3.0.0 present. Table at 0x7FFE3650. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: SeaBIOS Version: rel-1.14.0-45-g7639f071 Release Date: 04/01/2014 Address: 0xE8000 Runtime Size: 96 kB ROM Size: 64 kB Characteristics: BIOS characteristics not supported [root@fedora ~]# dmidecode --dump-bin 960_smbios.bin # dmidecode 3.2 Getting SMBIOS data from sysfs. SMBIOS 3.0.0 present. Table at 0x7FFE3650. # Writing 117155 bytes to 960_smbios.bin. # Writing 24 bytes to 960_smbios.bin. [root@fedora ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 960 On-line CPU(s) list: 0-959 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 960 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz Stepping: 6 CPU MHz: 2400.236 BogoMIPS: 4800.47 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 30 MiB L1i cache: 30 MiB L2 cache: 3.8 GiB L3 cache: 15 GiB NUMA node0 CPU(s): 0-959 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; TSX disabled Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arc h_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_t imer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd av x512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities We have more time today to isolate Created attachment 1817971 [details] Successful debugcon file for Comment 64 Created attachment 1817972 [details] Failing debugcon file for Comment 64 Created attachment 1817973 [details] xml file for Comment 64 Hello Eduardo, The last test session started with 12TB and 1024 vcpus which failed for both uefi and seabios virtual machines. Further resolution found 8TB and 960 vcpus worked for the seabios image but the uefi vm failed with the following in the debugcon log file. We turned our focus to increasing the seabios limits and only found 8TB and 2vcpus to work with the uefi vm with all failures generating the same error in the debugcon log file. We will focus on the specific uefi limits during the next test session. ASSERT_EFI_ERROR (Status = Out of Resources) ASSERT /builddir/build/BUILD/edk2-e1999b264f1f/UefiCpuPkg/CpuMpPei/CpuBist.c(186): !EFI_ERROR (Status) Any comments? Regards, Brian (In reply to Brian Payton from comment #68) > ASSERT_EFI_ERROR (Status = Out of Resources) > ASSERT > /builddir/build/BUILD/edk2-e1999b264f1f/UefiCpuPkg/CpuMpPei/CpuBist.c(186): > !EFI_ERROR (Status) > > Any comments? This looks like the bug we are already tracking at bug 1982176. The SeaBIOS failures look new (so I guess that's good news!), and we need to investigate further. (In reply to Brian Payton from comment #64) > Copying SMBIOS 3.0 from 0x00006cd9 to 0x000f5d00 > WARNING - Unable to allocate resource at romfile_loader_allocate:87! Maybe SeaBIOS has an internal memory allocation size limit similar to the edk2 limitations tracked at bug 1982176. Note that the TSEG size setting affects only UEFI (AFAIK), so we probably need something different to make larger SMBIOS tables work with SeaBIOS. In the meantime, I would run experiments to see if the VCPU limit is higher if the RAM size is relatively small (with both edk2 and SeaBIOS). Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. We found the following limits using RHEL 8.5, RHEL-AV 8.5 Alpha1, qemu-kvm-6.0.0-27.el8.ehabkost202108111546.x86_64 with SMBIOS 3.0 and a custom Seabios.bin to work with SMBIOS 3.0. Legacy (Seabios) 960 vcpus 8TB memory Legacy numa 768 vcpus 24TB memory 32 sockets UEFI (OVMF) 960 vcpus 7TB memory UEFI numa 768 vcpus 1TB memory 32 sockets Any comments? Have a good day. Regards, Brian Created attachment 1823459 [details] package list for Comment 72 Created attachment 1823460 [details] Legacy numa xml for Comment 72 Created attachment 1823461 [details] Legacy xml for Comment 72 Created attachment 1823462 [details] UEFI numa xml for Comment 72 Created attachment 1823463 [details] UEFI xml for Comment 72 Created attachment 1823464 [details] Spreadsheet for Comment 72 Created attachment 1823465 [details] Debug log for spreadsheet row 3 for Comment 72 Created attachment 1823466 [details] Debug log for spreadsheet row 5 for Comment 72 Moving back to virt-maint. The upstream patches for this (not merged yet) are available at: https://lore.kernel.org/qemu-devel/20211026151100.1691925-1-ehabkost@redhat.com/ (In reply to Eduardo Habkost from comment #51) > I will open a separate BZ for that specific issue. The remaining cases (<= > 8 TB VMs) seem to be due to small TSEG size. Bug reported at https://bugzilla.redhat.com/show_bug.cgi?id=2023977 QEMU patches are merged upstream: 10be11d0b48 smbios: Rename SMBIOS_ENTRY_POINT_* enums bdf54a9a7bd hw/smbios: Use qapi for SmbiosEntryPointType 0e4edb3b3b5 hw/i386: expose a "smbios-entry-point-type" PC machine property Increasing the priority to match with BZ#1906077, since as the next step we would like to get all the bits associated with the Support SMBIOS 3.0 Entry Point merged. Also, setting the target release to 9.1. Moving the BZ to POST so that the patches can come via rebase. Igor has already shared the commits in comment 85, so clearing his needinfo. QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. According to Comment 4, tested with qemu-kvm-7.0.0-1.el9, It wroks as expected. So set status to VERIFIED. If I was wrong, please correct me. Thanks. Versions: kernel-5.14.0-78.el9.x86_64 qemu-kvm-7.0.0-1.el9 edk2-ovmf-20220221gitb24306f15d-1.el9.noarch seabios-bin-1.16.0-1.el9.noarch 1. boot a guest with seabios. (don't add smbios-entry-point-type=64 into qemu command lines) 2. boot a guest with seabios. (add smbios-entry-point-type=64 into qemu command lines) # cat debug.sh /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem,smbios-entry-point-type=64 \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 29696 \ -object memory-backend-ram,size=29696M,id=mem-machine_mem \ -smp 64,maxcpus=64,cores=16,threads=2,dies=1,sockets=2 \ -cpu 'Skylake-Server-IBRS',+kvm_pv_unhalt \ -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/avocado_ti4stbke/monitor-qmpmonitor1-20220412-052832-OLNnOWJ3,server=on,wait=off \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/tmp/avocado_ti4stbke/monitor-catch_monitor-20220412-052832-OLNnOWJ3,server=on,wait=off \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id5VPWWY \ -chardev socket,id=chardev_serial0,path=/tmp/avocado_ti4stbke/serial-serial0-20220412-052832-OLNnOWJ3,server=on,wait=off \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20220412-052832-OLNnOWJ3,path=/tmp/avocado_ti4stbke/seabios-20220412-052832-OLNnOWJ3,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20220412-052832-OLNnOWJ3,iobase=0x402 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/rhel910-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:83:04:f4:b5:1d,id=id9CnYs6,netdev=idNXvszc,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idNXvszc,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio \ 3. boot a guest with edk2. (don't add smbios-entry-point-type=64 into qemu command lines) 4. boot a guest with edk2. (add smbios-entry-point-type=64 into qemu command lines) After step 1, guest boot up successfully, check smbios versoin in guest. # dmidecode the output like: # dmidecode 3.3 Getting SMBIOS data from sysfs. SMBIOS 2.8 present. 13 structures occupying 689 bytes. Table at 0x7FFFFD40. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: SeaBIOS Version: 1.16.0-1.el9 Release Date: 04/01/2014 Address: 0xE8000 Runtime Size: 96 kB ROM Size: 64 kB Characteristics: BIOS characteristics not supported Targeted content distribution is supported BIOS Revision: 0.0 ...... ...... Handle 0x2000, DMI type 32, 11 bytes System Boot Information Status: No errors detected Handle 0x7F00, DMI type 127, 4 bytes End Of Table After step 2, guest boot up successfully, check smbios versoin in guest. # dmidecode the output like: # dmidecode 3.3 Getting SMBIOS data from sysfs. SMBIOS 3.0.0 present. Table at 0x7FFFFD40. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: SeaBIOS Version: 1.16.0-1.el9 Release Date: 04/01/2014 Address: 0xE8000 Runtime Size: 96 kB ROM Size: 64 kB Characteristics: BIOS characteristics not supported Targeted content distribution is supported BIOS Revision: 0.0 ...... ...... Handle 0x2000, DMI type 32, 11 bytes System Boot Information Status: No errors detected Handle 0x7F00, DMI type 127, 4 bytes End Of Table After step 3, the result is similar with step 1. After step 4, the result is similar with step 2. Hi Chensheng, I found that you added the case VIRT-8962 link in the bug, just a reminder, I think you should test it with smbios 3.0 (qemu-kvm -M q35,smbios-entry-point-type=64). Thanks. Hi xueqiang, Had covered with smbios 3.0, thanks for you remind. BTW, our host only have 448 vcpu and 8T memory Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7967 |