RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1904267 - Q35: Support SMBIOS 3.0 Entry Point Type
Summary: Q35: Support SMBIOS 3.0 Entry Point Type
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: 9.1
Assignee: Igor Mammedov
QA Contact: Xueqiang Wei
URL:
Whiteboard:
Depends On: 2064757
Blocks: 1788991 1906077 1942820 2004662
TreeView+ depends on / blocked
 
Reported: 2020-12-03 23:42 UTC by Eduardo Habkost
Modified: 2022-11-15 10:16 UTC (History)
22 users (show)

Fixed In Version: qemu-kvm-7.0.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-15 09:53:23 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:
payton: needinfo-
payton: needinfo-


Attachments (Terms of Use)
virt-install command for Comment 5 (331 bytes, text/plain)
2021-06-22 13:46 UTC, Brian Payton
no flags Details
xml for Comment 5 (6.89 KB, text/plain)
2021-06-22 13:47 UTC, Brian Payton
no flags Details
debug output for success in Comment 5 (746.98 KB, text/plain)
2021-06-22 13:48 UTC, Brian Payton
no flags Details
debug output for 448 vcpu failure in Comment 5 (182.35 KB, text/plain)
2021-06-22 13:49 UTC, Brian Payton
no flags Details
debug output for 512 vcpu failure in Comment 5 (196.04 KB, text/plain)
2021-06-22 13:50 UTC, Brian Payton
no flags Details
working xml file for comment #14 (5.73 KB, text/plain)
2021-06-22 20:23 UTC, Brian Payton
no flags Details
Log file for Comment 14 (5.18 KB, text/plain)
2021-06-22 20:24 UTC, Brian Payton
no flags Details
Comment 20 xml (6.94 KB, text/plain)
2021-06-29 18:22 UTC, Brian Payton
no flags Details
Comment 20 log (5.26 KB, text/plain)
2021-06-29 18:22 UTC, Brian Payton
no flags Details
Comment 20 debugcon.txt (1.22 MB, text/plain)
2021-06-29 18:23 UTC, Brian Payton
no flags Details
Comment 20 inside the running vm (2.72 KB, text/plain)
2021-06-29 18:24 UTC, Brian Payton
no flags Details
Comment 25 xml (48.87 KB, text/plain)
2021-06-29 18:26 UTC, Brian Payton
no flags Details
Comment 25 log (9.54 KB, text/plain)
2021-06-29 18:27 UTC, Brian Payton
no flags Details
Comment 25 debugcon.txt (1.09 MB, text/plain)
2021-06-29 18:28 UTC, Brian Payton
no flags Details
Comment 25 inside the running vm (3.96 KB, text/plain)
2021-06-29 18:29 UTC, Brian Payton
no flags Details
Comment 30 xml (6.94 KB, text/plain)
2021-06-29 18:43 UTC, Brian Payton
no flags Details
Comment 30 log (5.02 KB, text/plain)
2021-06-29 18:43 UTC, Brian Payton
no flags Details
Comment 30 debugcon.txt (7.09 KB, text/plain)
2021-06-29 18:44 UTC, Brian Payton
no flags Details
Comment 34 xml (5.73 KB, text/plain)
2021-06-29 18:46 UTC, Brian Payton
no flags Details
Comment 34 log (5.13 KB, text/plain)
2021-06-29 18:46 UTC, Brian Payton
no flags Details
Comment 34 command line output (1.07 KB, text/plain)
2021-06-29 18:47 UTC, Brian Payton
no flags Details
Comment 38 xml (48.87 KB, text/plain)
2021-06-29 18:49 UTC, Brian Payton
no flags Details
Comment 38 log (9.36 KB, text/plain)
2021-06-29 18:50 UTC, Brian Payton
no flags Details
Comment 38 debugcon.txt (184.09 KB, text/plain)
2021-06-29 18:51 UTC, Brian Payton
no flags Details
Comment 42 xml (52.84 KB, text/plain)
2021-06-29 19:13 UTC, Brian Payton
no flags Details
Comment 42 log (9.36 KB, text/plain)
2021-06-29 19:14 UTC, Brian Payton
no flags Details
Comment 42 debugcon.txt (7.09 KB, text/plain)
2021-06-29 19:15 UTC, Brian Payton
no flags Details
Comment 52 debugcon.txt file, Comment #30 with tseg=128 (7.09 KB, text/plain)
2021-06-29 20:51 UTC, Brian Payton
no flags Details
SeaBIOS image with SMBIOS 3.0 support (256.00 KB, application/octet-stream)
2021-07-02 13:26 UTC, Eduardo Habkost
no flags Details
Successful debugcon file for Comment 64 (35.34 KB, text/plain)
2021-08-26 14:45 UTC, Brian Payton
no flags Details
Failing debugcon file for Comment 64 (33.65 KB, text/plain)
2021-08-26 14:46 UTC, Brian Payton
no flags Details
xml file for Comment 64 (5.37 KB, text/plain)
2021-08-26 14:47 UTC, Brian Payton
no flags Details
package list for Comment 72 (68.53 KB, text/plain)
2021-09-16 00:24 UTC, Brian Payton
no flags Details
Legacy numa xml for Comment 72 (43.11 KB, text/plain)
2021-09-16 00:45 UTC, Brian Payton
no flags Details
Legacy xml for Comment 72 (5.20 KB, text/plain)
2021-09-16 00:46 UTC, Brian Payton
no flags Details
UEFI numa xml for Comment 72 (41.98 KB, text/plain)
2021-09-16 00:47 UTC, Brian Payton
no flags Details
UEFI xml for Comment 72 (5.09 KB, text/plain)
2021-09-16 00:48 UTC, Brian Payton
no flags Details
Spreadsheet for Comment 72 (28.26 KB, application/zip)
2021-09-16 00:52 UTC, Brian Payton
no flags Details
Debug log for spreadsheet row 3 for Comment 72 (37.98 KB, text/plain)
2021-09-16 00:55 UTC, Brian Payton
no flags Details
Debug log for spreadsheet row 5 for Comment 72 (427.96 KB, text/plain)
2021-09-16 01:02 UTC, Brian Payton
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:7967 0 None None None 2022-11-15 09:54:27 UTC

Description Eduardo Habkost 2020-12-03 23:42:45 UTC
A VM with more than ~720 VCPUs can hit the 65535 bytes limit on SMBIOS tables.  edk2, however, supports SMBIOS 3.0 Entry Points.

We need to support SMBIOS 3.0 entry points on Q35, probably through a new command line option.

Comment 1 Eduardo Habkost 2020-12-03 23:44:15 UTC
Work in progress at:
https://gitlab.com/ehabkost/qemu/-/commits/work/smbios-configuration

Comment 2 John Ferlan 2020-12-04 14:57:03 UTC
Since you seem to already be working on it, I'll assign to you. I set prio as medium, but feel free to adjust.

Comment 3 Yanhui Ma 2020-12-21 02:35:19 UTC
Hello Eduardo,

Could you tell me if this bug needs big machine with 720 cpus to test?

Comment 4 Eduardo Habkost 2020-12-21 18:16:35 UTC
(In reply to Yanhui Ma from comment #3)
> Hello Eduardo,
> 
> Could you tell me if this bug needs big machine with 720 cpus to test?

It doesn't.  It can be tested by just starting a small VM with SMBIOS 3.0 entry point configured, and then checking output of `dmidecode` in the guest.

Comment 5 Brian Payton 2021-06-22 13:41:45 UTC
Hello Eduardo,

      We finally created a working instance using your git clone for qemu on an 8TB, 640 cpus, 32 socket machine.  After many iterations we found the following.

500GB with 416 vcpus boots.  7TB also boots but takes a long time.


500GB with 448 vcpus fails with the following in /tmp/debugcon.txt

Loading SMM driver at 0x0007F0D7000 EntryPoint=0x0007F0DF99D VariableSmm.efi
mSmmMemLibInternalMaximumSupportAddress = 0x1FFFFFFFFFF
VarCheckLibRegisterSetVariableCheckHandler - 0x7F0DE7AE Success
Variable driver common space: 0x3FF9C 0x3FF9C 0x3FF9C
Variable driver will work with auth variable format!

ASSERT_EFI_ERROR (Status = Out of Resources)
ASSERT /builddir/build/BUILD/edk2-ca407c7246bf/MdeModulePkg/Universal/Variable/RuntimeDxe/VariableSmm.c(1111): !EFI_ERROR (Status)

500GB with 512 vcpus fails with the following in /tmp/debugcon.txt

CPU[1FC]  APIC ID=01FC  SMBASE=7FFAB000  SaveState=7FFBAC00  Size=00000400
CPU[1FD]  APIC ID=01FD  SMBASE=7FFAD000  SaveState=7FFBCC00  Size=00000400
CPU[1FE]  APIC ID=01FE  SMBASE=7FFAF000  SaveState=7FFBEC00  Size=00000400
CPU[1FF]  APIC ID=01FF  SMBASE=7FFB1000  SaveState=7FFC0C00  Size=00000400
ASSERT /builddir/build/BUILD/edk2-ca407c7246bf/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c(894): Stacks != ((void *) 0)

We will attach more information

Any comments?

Have a good day.

Regards,

Brian

Comment 6 Brian Payton 2021-06-22 13:46:42 UTC
Created attachment 1793023 [details]
virt-install command for Comment 5

Comment 7 Brian Payton 2021-06-22 13:47:27 UTC
Created attachment 1793024 [details]
xml for Comment 5

Comment 8 Brian Payton 2021-06-22 13:48:21 UTC
Created attachment 1793025 [details]
debug output for success in Comment 5

Comment 9 Brian Payton 2021-06-22 13:49:13 UTC
Created attachment 1793026 [details]
debug output for 448 vcpu failure in Comment 5

Comment 10 Brian Payton 2021-06-22 13:50:24 UTC
Created attachment 1793027 [details]
debug output for 512 vcpu failure in Comment 5

Comment 11 Eduardo Habkost 2021-06-22 17:55:36 UTC
(In reply to Brian Payton from comment #5)
> Hello Eduardo,
> 
>       We finally created a working instance using your git clone for qemu on
> an 8TB, 640 cpus, 32 socket machine.  After many iterations we found the
> following.
> 
> 500GB with 416 vcpus boots.  7TB also boots but takes a long time.
> 
> 
> 500GB with 448 vcpus fails with the following in /tmp/debugcon.txt
> 
> Loading SMM driver at 0x0007F0D7000 EntryPoint=0x0007F0DF99D VariableSmm.efi
> mSmmMemLibInternalMaximumSupportAddress = 0x1FFFFFFFFFF
> VarCheckLibRegisterSetVariableCheckHandler - 0x7F0DE7AE Success
> Variable driver common space: 0x3FF9C 0x3FF9C 0x3FF9C
> Variable driver will work with auth variable format!
> 
> ASSERT_EFI_ERROR (Status = Out of Resources)
> ASSERT
> /builddir/build/BUILD/edk2-ca407c7246bf/MdeModulePkg/Universal/Variable/
> RuntimeDxe/VariableSmm.c(1111): !EFI_ERROR (Status)

You are probably out of memory due to too small TSEG size.  See bug 1469338 and bug 1866110.

I suggest testing using `-global mch.extended-tseg-mbytes=64`.

> 
> 500GB with 512 vcpus fails with the following in /tmp/debugcon.txt
> 
> CPU[1FC]  APIC ID=01FC  SMBASE=7FFAB000  SaveState=7FFBAC00  Size=00000400
> CPU[1FD]  APIC ID=01FD  SMBASE=7FFAD000  SaveState=7FFBCC00  Size=00000400
> CPU[1FE]  APIC ID=01FE  SMBASE=7FFAF000  SaveState=7FFBEC00  Size=00000400
> CPU[1FF]  APIC ID=01FF  SMBASE=7FFB1000  SaveState=7FFC0C00  Size=00000400
> ASSERT
> /builddir/build/BUILD/edk2-ca407c7246bf/UefiCpuPkg/PiSmmCpuDxeSmm/
> PiSmmCpuDxeSmm.c(894): Stacks != ((void *) 0)

This is just the lack of memory in a different location.  For reference, the code triggering the assert is:

    Stacks = (UINT8 *) AllocatePages (gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus * (EFI_SIZE_TO_PAGES (mSmmStackSize + mSmmShadowStackSize)));
    ASSERT (Stacks != NULL);

Comment 12 Brian Payton 2021-06-22 18:00:15 UTC
Thanks Eduardo.  I will update the qemu:commandline entry in the xml with the following and try again.

  <qemu:commandline>
    <qemu:arg value='-machine'/>
    <qemu:arg value='smbios-ep=3_0'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='mch.extended-tseg-mbytes=64'/>
    <qemu:arg value='-chardev'/>
    <qemu:arg value='file,path=/tmp/debugcon.txt,id=debugcon'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='isa-debugcon,iobase=0x402,chardev=debugcon'/>
  </qemu:commandline>

Regards,

Brian

Comment 13 Brian Payton 2021-06-22 19:19:51 UTC
Hello Eduardo,

      Thank you for your help.  With the following settings, a 7.5TB 640 vcpu virtual machine is running on our 8TB 640 cpu 32 socket server.  We have an improved working example for numa settings and performance testing on this system and on the 12TB and 24TB systems when we can reserve them.


  <memory unit='KiB'>8053063680</memory>
  <currentMemory unit='KiB'>8053063680</currentMemory>
  <vcpu placement='static'>640</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-6.0'>hvm</type>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/demo_uefi_VARS.fd</nvram>
    <boot dev='hd'/>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
    <smm state='on'>
      <tseg unit='MiB'>64</tseg>
    </smm>

      Please let me know if you want any changes or preferred testing.

      Any comments?

      Have a great day.

Regards,

Brian

Comment 14 Eduardo Habkost 2021-06-22 19:27:07 UTC
(In reply to Brian Payton from comment #13)
>       Please let me know if you want any changes or preferred testing.

I don't have any change suggestion by now.  My only request is that the full domain XML and qemu.log file for the latest working configuration be attached to the BZ, so we have a record of a known working config for future reference.

Comment 15 Brian Payton 2021-06-22 20:23:48 UTC
Created attachment 1793238 [details]
working xml file for comment #14

This works with the vcpus set to 768, more than the 640 the system has.

Comment 16 Brian Payton 2021-06-22 20:24:17 UTC
Created attachment 1793239 [details]
Log file for Comment 14

Comment 17 Brian Payton 2021-06-22 20:47:53 UTC
For further reference:

On the host:

[root@fsg-uv2k-3 qemu]# free -g
              total        used        free      shared  buff/cache   available
Mem:           7809         180        7624           0           3        7609
Swap:            11           0          11
[root@fsg-uv2k-3 qemu]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              640
On-line CPU(s) list: 0-639
Thread(s) per core:  2
Core(s) per socket:  10
Socket(s):           32
NUMA node(s):        32
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Intel(R) Corporation
CPU family:          6
Model:               62
Model name:          Intel(R) Xeon(R) CPU E5-4650 v2 @ 2.40GHz
BIOS Model name:     Intel(R) Xeon(R) CPU E5-4650 v2 @ 2.40GHz
Stepping:            4
CPU MHz:             2864.253
CPU max MHz:         2900.0000
CPU min MHz:         1200.0000
BogoMIPS:            4799.96
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            25600K
NUMA node0 CPU(s):   0-9,320-329
NUMA node1 CPU(s):   10-19,330-339
NUMA node2 CPU(s):   20-29,340-349
NUMA node3 CPU(s):   30-39,350-359
NUMA node4 CPU(s):   40-49,360-369
NUMA node5 CPU(s):   50-59,370-379
NUMA node6 CPU(s):   60-69,380-389
NUMA node7 CPU(s):   70-79,390-399
NUMA node8 CPU(s):   80-89,400-409
NUMA node9 CPU(s):   90-99,410-419
NUMA node10 CPU(s):  100-109,420-429
NUMA node11 CPU(s):  110-119,430-439
NUMA node12 CPU(s):  120-129,440-449
NUMA node13 CPU(s):  130-139,450-459
NUMA node14 CPU(s):  140-149,460-469
NUMA node15 CPU(s):  150-159,470-479
NUMA node16 CPU(s):  160-169,480-489
NUMA node17 CPU(s):  170-179,490-499
NUMA node18 CPU(s):  180-189,500-509
NUMA node19 CPU(s):  190-199,510-519
NUMA node20 CPU(s):  200-209,520-529
NUMA node21 CPU(s):  210-219,530-539
NUMA node22 CPU(s):  220-229,540-549
NUMA node23 CPU(s):  230-239,550-559
NUMA node24 CPU(s):  240-249,560-569
NUMA node25 CPU(s):  250-259,570-579
NUMA node26 CPU(s):  260-269,580-589
NUMA node27 CPU(s):  270-279,590-599
NUMA node28 CPU(s):  280-289,600-609
NUMA node29 CPU(s):  290-299,610-619
NUMA node30 CPU(s):  300-309,620-629
NUMA node31 CPU(s):  310-319,630-639
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm cpuid_fault epb pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d

In the virtual machine:

[root@fedora ~]# free -g
               total        used        free      shared  buff/cache   available
Mem:            7558           4        7553           0           0        7536
Swap:              7           0           7
[root@fedora ~]# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          768
On-line CPU(s) list:             0-767
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       768
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           62
Model name:                      Intel(R) Xeon(R) CPU E5-4650 v2 @ 2.40GHz
Stepping:                        4
CPU MHz:                         2399.980
BogoMIPS:                        4799.96
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       24 MiB
L1i cache:                       24 MiB
L2 cache:                        3 GiB
L3 cache:                        12 GiB
NUMA node0 CPU(s):               0-767
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xs
                                 ave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep erms xsaveopt arat umip md_clear arch_capabilities

Comment 18 Brian Payton 2021-06-28 21:41:04 UTC
Hello Eduardo,

      We moved the testing to a larger system.  Currently 192 vcpus, 32 sockets. and 8TB of memory works.  10TB and above fail with the following in /tmp/debugcon.txt, using your latest qemu git clone. I am probing the vcpu limit now and will gather more information for this bug after both limits are found.  I Just thought you might be interested in the current message.

SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table
SmbiosCreateTable() re-allocate SMBIOS 32-bit table
SmbiosCreate64BitTable() re-allocate SMBIOS 64-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 32-bit table
SmbiosAdd: Smbios type 17 with size 0x37 is added to 64-bit table

ASSERT_EFI_ERROR (Status = Already started)
ASSERT /builddir/build/BUILD/edk2-ca407c7246bf/OvmfPkg/SmbiosPlatformDxe/SmbiosPlatformDxe.c(125): !EFI_ERROR (Status)

From the vm log file:

-global mch.extended-tseg-mbytes=128 \
-cpu host,migratable=on \
-global driver=cfi.pflash01,property=secure,value=on \
-m 10485760 \

Regards,

Brian

Comment 19 Brian Payton 2021-06-29 18:19:18 UTC
Hello Eduardo,

      I have 5 examples below to enter as 5 separate comments and their associated files.  The 9TB vm failures were different between the simple and numa configurations.  I will verify or capture any difference between 1024 vcpus simple and numa configured.

Simple memory and vcpus configuration

      - Working with 8TB and 960 vcpus

      - Failure with 9TB and 960 vcpus

      - Failure with 8TB and 1024 vcpus

numa configured memory and vcpus

      - Working with 8TB and 960 vcpus

      - Failure with 9TB and 960 vcpus

      Have a good day.

Regards,

Brian

Comment 20 Brian Payton 2021-06-29 18:20:56 UTC
Working simple configuration

Comment 21 Brian Payton 2021-06-29 18:22:14 UTC
Created attachment 1795927 [details]
Comment 20 xml

Comment 22 Brian Payton 2021-06-29 18:22:41 UTC
Created attachment 1795928 [details]
Comment 20 log

Comment 23 Brian Payton 2021-06-29 18:23:27 UTC
Created attachment 1795929 [details]
Comment 20 debugcon.txt

Comment 24 Brian Payton 2021-06-29 18:24:01 UTC
Created attachment 1795930 [details]
Comment 20 inside the running vm

Comment 25 Brian Payton 2021-06-29 18:26:03 UTC
Working 8TB 960 vcpu configured with across 32 numa cells.

Comment 26 Brian Payton 2021-06-29 18:26:52 UTC
Created attachment 1795931 [details]
Comment 25 xml

Comment 27 Brian Payton 2021-06-29 18:27:18 UTC
Created attachment 1795932 [details]
Comment 25 log

Comment 28 Brian Payton 2021-06-29 18:28:40 UTC
Created attachment 1795933 [details]
Comment 25 debugcon.txt

Comment 29 Brian Payton 2021-06-29 18:29:47 UTC
Created attachment 1795934 [details]
Comment 25 inside the running vm

Comment 30 Brian Payton 2021-06-29 18:42:33 UTC
Simple failure with 1024 vcpus and 8TB

Comment 31 Brian Payton 2021-06-29 18:43:19 UTC
Created attachment 1795935 [details]
Comment 30 xml

Comment 32 Brian Payton 2021-06-29 18:43:51 UTC
Created attachment 1795936 [details]
Comment 30 log

Comment 33 Brian Payton 2021-06-29 18:44:23 UTC
Created attachment 1795937 [details]
Comment 30 debugcon.txt

Comment 34 Brian Payton 2021-06-29 18:45:31 UTC
Simple failure with 9TB and 960 vcpus.

Comment 35 Brian Payton 2021-06-29 18:46:12 UTC
Created attachment 1795939 [details]
Comment 34 xml

Comment 36 Brian Payton 2021-06-29 18:46:42 UTC
Created attachment 1795940 [details]
Comment 34 log

Comment 37 Brian Payton 2021-06-29 18:47:21 UTC
Created attachment 1795941 [details]
Comment 34 command line output

Comment 38 Brian Payton 2021-06-29 18:48:39 UTC
9TB 960 vcpu 32 socket numa configuration

Comment 39 Brian Payton 2021-06-29 18:49:21 UTC
Created attachment 1795943 [details]
Comment 38 xml

Comment 40 Brian Payton 2021-06-29 18:50:48 UTC
Created attachment 1795945 [details]
Comment 38 log

Comment 41 Brian Payton 2021-06-29 18:51:24 UTC
Created attachment 1795946 [details]
Comment 38 debugcon.txt

Comment 42 Brian Payton 2021-06-29 19:07:30 UTC
This is the new failure with 8TB 1024 vcpus and 32 sockets numa configured.

Comment 43 Brian Payton 2021-06-29 19:13:12 UTC
Created attachment 1795949 [details]
Comment 42 xml

Comment 44 Brian Payton 2021-06-29 19:14:08 UTC
Created attachment 1795951 [details]
Comment 42 log

Comment 45 Brian Payton 2021-06-29 19:15:19 UTC
Created attachment 1795952 [details]
Comment 42 debugcon.txt

Comment 46 Eduardo Habkost 2021-06-29 19:25:12 UTC
(In reply to Brian Payton from comment #30)
> Simple failure with 1024 vcpus and 8TB

1024vcpu.debugcon shows 64 MB TSEG, and a "Out of Resources" failure. Probably TSEG size needs to be even larger for this configuration.

(In reply to Brian Payton from comment #38)
> 9TB 960 vcpu 32 socket numa configuration

This one has a different error (ASSERT_EFI_ERROR (Status = Already started)) but it also has a 64 MB TSEG.


(In reply to Brian Payton from comment #42)
> This is the new failure with 8TB 1024 vcpus and 32 sockets numa configured.

This is also a TSEG size issue (ASSERT_EFI_ERROR (Status = Out of Resources)).


Is a debugcon log + QEMU command line available for the 128 MB TSEG failure mentioned in comment #18?  I want to make sure it's really not a TSEG size issue.

Comment 47 Brian Payton 2021-06-29 19:29:24 UTC
Hello Eduardo,

      I will rerun Comments 30, 38, and 42 with TSEG set to 128.  Would you also like to see a 256 setting?

Regards,

Brian

Comment 48 Eduardo Habkost 2021-06-29 19:47:54 UTC
(In reply to Brian Payton from comment #47)
> Hello Eduardo,
> 
>       I will rerun Comments 30, 38, and 42 with TSEG set to 128.  Would you
> also like to see a 256 setting?

If it still fails with 128 MB, yes please!  We don´t know yet what are the required TSEG sizes for each of those scenarios because we never tested them.

Comment 49 Eduardo Habkost 2021-06-29 19:52:03 UTC
(In reply to Brian Payton from comment #18)
> ASSERT_EFI_ERROR (Status = Already started)
> ASSERT
> /builddir/build/BUILD/edk2-ca407c7246bf/OvmfPkg/SmbiosPlatformDxe/
> SmbiosPlatformDxe.c(125): !EFI_ERROR (Status)

This one might be an unexpected duplicate SMBIOS handle in the tables generated by QEMU.  I will double-check the SMBIOS table generation code to be sure.

Comment 50 Brian Payton 2021-06-29 20:21:12 UTC
Hello Eduardo,

      I repeated the Comment #42 test with tseg set to 128, 256, and 1GiB and still received the same results.  The second PhysicalSize entry towards the end changed with the increased tseg value but still failed with the Out of Resources issue.  Since the simpler test in Comment #30 shows similar symptoms for you,  I will do some isolation there since it is easier to change the vcpu count.

      Can I gather more information for you or increase the debug level?

      Be back before long hopefully.

Regards,

Brian

Comment 51 Eduardo Habkost 2021-06-29 20:36:56 UTC
(In reply to Eduardo Habkost from comment #49)
> (In reply to Brian Payton from comment #18)
> > ASSERT_EFI_ERROR (Status = Already started)
> > ASSERT
> > /builddir/build/BUILD/edk2-ca407c7246bf/OvmfPkg/SmbiosPlatformDxe/
> > SmbiosPlatformDxe.c(125): !EFI_ERROR (Status)
> 
> This one might be an unexpected duplicate SMBIOS handle in the tables
> generated by QEMU.  I will double-check the SMBIOS table generation code to
> be sure.

Yes, this is where the problem comes from:


static void smbios_build_type_17_table(unsigned instance, uint64_t size)
{
    SMBIOS_BUILD_TABLE_PRE(17, 0x1100 + instance, true); /* required */

...
static void smbios_build_type_19_table(unsigned instance,
                                       uint64_t start, uint64_t size)
{
    SMBIOS_BUILD_TABLE_PRE(19, 0x1300 + instance, true); /* required */


Using the current handle assignment code, we can have only up to 512 DIMM slots before their type 17 SMBIOS handles conflict with the type 19 table handles.  QEMU has DIMM sizes hardcoded to 16GB, which means an 8TB guest will hit the limit.

I will open a separate BZ for that specific issue.  The remaining cases (<= 8 TB VMs) seem to be due to small TSEG size.

Comment 52 Eduardo Habkost 2021-06-29 20:40:31 UTC
(In reply to Brian Payton from comment #50)
> Hello Eduardo,
> 
>       I repeated the Comment #42 test with tseg set to 128, 256, and 1GiB
> and still received the same results.  The second PhysicalSize entry towards
> the end changed with the increased tseg value but still failed with the Out
> of Resources issue.  Since the simpler test in Comment #30 shows similar
> symptoms for you,  I will do some isolation there since it is easier to
> change the vcpu count.
> 
>       Can I gather more information for you or increase the debug level?

Can you please attach the debugcon output with 128MB TSEG (all remaining configuration being exactly the same as comment #30), so we can compare with comment #30?  Comparing both log files might help us identify what's wrong.

Laszlo, any advice on what could help us debug the issue?

Comment 53 Brian Payton 2021-06-29 20:51:12 UTC
Created attachment 1796021 [details]
Comment 52 debugcon.txt file, Comment #30 with tseg=128

Comment 54 Eduardo Habkost 2021-06-29 21:01:43 UTC
(In reply to Brian Payton from comment #30)
> Simple failure with 1024 vcpus and 8TB

For reference, this is the failure on debugcon:

[...]
GetMicrocodePatchInfoFromHob: Microcode patch cache HOB is not found.
CpuMpPei: 5-Level Paging = 0
Register PPI Notify: 8F9D4825-797D-48FC-8471-845025792EF6

ASSERT_EFI_ERROR (Status = Out of Resources)
ASSERT /builddir/build/BUILD/edk2-ca407c7246bf/UefiCpuPkg/CpuMpPei/CpuBist.c(186): !EFI_ERROR (Status)



And this is the failing code:

  BistInformationSize = sizeof (EFI_SEC_PLATFORM_INFORMATION_RECORD2) +
                        sizeof (EFI_SEC_PLATFORM_INFORMATION_CPU) * NumberOfProcessors;
  Status = PeiServicesAllocatePool (
             (UINTN) BistInformationSize,
             (VOID **) &PlatformInformationRecord2
             );
  ASSERT_EFI_ERROR (Status);

Maybe we're hitting some limit on allocation sizes?  I don't know what's the size of EFI_SEC_PLATFORM_INFORMATION_RECORD2 and EFI_SEC_PLATFORM_INFORMATION_CPU.

If making edk2 support more than 1024 VCPUs will require extra work, we probably should keep working on both SeaBIOS and OVMF support for larger VMs, just in case we find out that making SeaBIOS work will be easier than OVMF.

Comment 55 Brian Payton 2021-06-29 21:10:29 UTC
Hi Eduardo,

      Thanks for the information.  Frank cloned your git tree so we have pull and rebuild any changes you want to try.  We have the 24TB, 1792 cpu, 32 socket system a couple more days for testing.

Regards,

Brian

Comment 56 Brian Payton 2021-06-29 23:35:27 UTC
Hello Eduardo,

      To clarify the process, for OVMF testing now, I create each new vm instance from scratch using the Fedora iso dvd.  Once the Fedora is installed, I can edit the xml file for larger instances and features since my systems only include /usr/share/OVMF/OVMF_CODE.secboot.fd.  The OVMF_CODE.fd file does not exist.

      For future testing should we keep all things equal except the original virt-install command line option below?  This should switch between SeaBIOS and OVMF, and easily documented.

--boot menu=on,uefi,loader.secure='no' \

      Any comments?

      Have a good day.

Regards,

Brian

Comment 57 Laszlo Ersek 2021-06-30 19:55:13 UTC
(In reply to Eduardo Habkost from comment #54)
> (In reply to Brian Payton from comment #30)
> > Simple failure with 1024 vcpus and 8TB
> 
> For reference, this is the failure on debugcon:
> 
> [...]
> GetMicrocodePatchInfoFromHob: Microcode patch cache HOB is not found.
> CpuMpPei: 5-Level Paging = 0
> Register PPI Notify: 8F9D4825-797D-48FC-8471-845025792EF6
> 
> ASSERT_EFI_ERROR (Status = Out of Resources)
> ASSERT
> /builddir/build/BUILD/edk2-ca407c7246bf/UefiCpuPkg/CpuMpPei/CpuBist.c(186):
> !EFI_ERROR (Status)
> 
> 
> 
> And this is the failing code:
> 
>   BistInformationSize = sizeof (EFI_SEC_PLATFORM_INFORMATION_RECORD2) +
>                         sizeof (EFI_SEC_PLATFORM_INFORMATION_CPU) *
> NumberOfProcessors;
>   Status = PeiServicesAllocatePool (
>              (UINTN) BistInformationSize,
>              (VOID **) &PlatformInformationRecord2
>              );
>   ASSERT_EFI_ERROR (Status);
> 
> Maybe we're hitting some limit on allocation sizes?  I don't know what's the
> size of EFI_SEC_PLATFORM_INFORMATION_RECORD2 and
> EFI_SEC_PLATFORM_INFORMATION_CPU.

This is an edk2 design limitation.

Please file an edk2 bug for RHEL-8, and clone it for RHEL-9.

Meanwhile I've sent an upstream problem report:

* [edk2-devel] CPU count limitation in CpuMpPei BIST processing

  https://listman.redhat.com/archives/edk2-devel-archive/2021-June/msg01493.html
  http://mid.mail-archive.com/ffa9d7db-b670-8b88-758f-4785c8d05d40@redhat.com
  https://edk2.groups.io/g/devel/message/77376

Thanks
Laszlo

Comment 58 Laszlo Ersek 2021-06-30 20:19:09 UTC
(In reply to Brian Payton from comment #56)
> To clarify the process, for OVMF testing now, I create each new vm
> instance from scratch using the Fedora iso dvd.  Once the Fedora is
> installed, I can edit the xml file for larger instances and features
> since my systems only include /usr/share/OVMF/OVMF_CODE.secboot.fd.
> The OVMF_CODE.fd file does not exist.

You don't need "OVMF_CODE.fd". The "secboot" in "OVMF_CODE.secboot.fd"
only means that the Secure Boot *firmware feature* is included in the
firmware binary. It does not imply that the Secure Boot *operating mode*
is enabled as soon as the domain is defined and first launched.

The Secure Boot *operating mode* depends on the variable store template
file from which the newly defined domain's private variable store is
instantiated. In RHEL, two varstore templates are provided,
"OVMF_VARS.secboot.fd" and "OVMF_VARS.fd". By default, the former is
used; that's why you get new domains with the Secure Boot operating mode
enabled.

The simplest solution for permanently masking "OVMF_VARS.secboot.fd" on
the host is the following:

# umask 0022
# mkdir -p /etc/qemu/firmware
# touch /etc/qemu/firmware/40-edk2-ovmf-sb.json
# restorecon -FvvR /etc/qemu/firmware

Then just pass "--boot uefi" to virt-install.

(Technically, this masks the firmware descriptor file
"/usr/share/qemu/firmware/40-edk2-ovmf-sb.json", and so
"/usr/share/qemu/firmware/50-edk2-ovmf.json" will take effect. (Those
files are very easy to read for humans too, so please feel free to
consult them, or even diff them between each other.))


More flexible firmware use case selection, on a domain-by-domain basis,
is the subject of bug 1929357 (see the docs at
<https://libvirt.org/formatdomain.html#bios-bootloader>). However, I
cannot say how much of that is already exposed by the virt-install
utility.

One thing that certainly works is the following (very verbose) syntax,
which ignores the firmware descriptors (metadata files) under
"/usr/share/qemu/firmware" altogether:

(a)

--machine q35 \
--features smm=on \
--qemu-commandline='-global isa-debugcon.iobase=0x402 -debugcon file:/tmp/DOMAIN.ovmf.log' \
--boot loader=/usr/share/OVMF/OVMF_CODE.secboot.fd,loader_ro=yes,loader_type=pflash,loader_secure=yes,nvram_template=/usr/share/OVMF/OVMF_VARS.fd \

(b)

--machine q35 \
--features smm=on \
--qemu-commandline='-global isa-debugcon.iobase=0x402 -debugcon file:/tmp/DOMAIN.ovmf.log' \
--boot loader=/usr/share/OVMF/OVMF_CODE.secboot.fd,loader_ro=yes,loader_type=pflash,loader_secure=yes,nvram_template=/usr/share/OVMF/OVMF_VARS.secboot.fd \

Option (a) will give you a domain with the SB operational mode disabled,
option (b) will give you one with the SB operational mode enabled.

Note that these command line snippets only differ in the
"nvram_template" option-argument.


> For future testing should we keep all things equal except the original
> virt-install command line option below?  This should switch between
> SeaBIOS and OVMF, and easily documented.

For defining a SeaBIOS domain, simply add *none* of the "--features",
"--qemu-commandline", and "--boot" options.

Thanks,
Laszlo

Comment 59 Laszlo Ersek 2021-06-30 20:31:43 UTC
Another hint -- in order to avoid huge TSEG allocations, please enable 1GB page size support for your domain.

<domain ...>
  <cpu ...>
    <feature policy='require' name='pdpe1gb'/>
  </cpu>
</domain>

This should significantly decrease the SMRAM that needs to be allocated for the SMM page tables.

(I can see "pdpe1gb" in the *guest* cpuinfo in comment 17, so this setting could already be in place; I'm not sure.)

Comment 60 Brian Payton 2021-07-01 14:49:04 UTC
Hello Lazio,

      Thank you for all the great information.  I did see pdpe1gb must come through the pass-through setting we use and I am excited to try your OVMF information for non-secure boot configurations.

      As we learned in the past, we are pushing the limits because we can and should for our customers.  Identifying the limits and why is the primary goal of this investigation.  Hopefully we can overcome them and grow our solution.  If not, we know why.

      Have a great day.

Regards,

Brian

Comment 61 Brian Payton 2021-07-01 15:36:28 UTC
Hello Lazio,

      Just out of curiosity, your changes are very helpful for creating vms with virt-install and eliminate a lot of manual editing of the xml file.  I still have to add the following to enable more than 255vcpus on a vm.  Do you know of a virt-install option to generate this xml?

  <devices>
    <iommu model='intel'>
      <driver intremap='on' eim='on'/>
    </iommu>

      Thank you again for your help.  Have a great day.

Regards,

Brian

Comment 62 Laszlo Ersek 2021-07-02 09:01:21 UTC
Hi Brian,

(In reply to Brian Payton from comment #61)
> Hello Lazio,
> 
>       Just out of curiosity, your changes are very helpful for creating vms
> with virt-install and eliminate a lot of manual editing of the xml file.  I
> still have to add the following to enable more than 255vcpus on a vm.  Do
> you know of a virt-install option to generate this xml?
> 
>   <devices>
>     <iommu model='intel'>
>       <driver intremap='on' eim='on'/>
>     </iommu>
> 
>       Thank you again for your help.  Have a great day.

Upstream virt-manager (virt-install) has commit 25419db9caf0 ("virtinst: add support for configuring the IOMMU", 2020-07-12), which I believe would do what you need. However, I think this commit is not part of RHEL-8, as yet. I'm adding Cole to the CC list of this RHBZ to correct me if necessary. Thanks!

Comment 63 Eduardo Habkost 2021-07-02 13:26:02 UTC
Created attachment 1797190 [details]
SeaBIOS image with SMBIOS 3.0 support

SeaBIOS image for testing attached.  It can be used with the `-bios` option in the QEMU command line, or with the <loader> element as documented at https://libvirt.org/formatdomain.html#bios-bootloader

Source code for the binary is available at https://gitlab.com/ehabkost/seabios/-/tree/7639f0711ba5cb4f943396cf127b0821099420d4

Comment 64 Brian Payton 2021-08-26 14:42:50 UTC
Hello Eduardo,

      I repeated the testing with the seabios virtual instance shown below.  8TB is the single memory limit as before and I was unable to repeat the numa configurations.  The seabios test limit was 8TB with 960 vcpus.  More vcpus generated the following in the debugcon log file and increasing the tseg size up to 1GB did not alter these results.

      Any comments?

Regards,

Brian

Copying SMBIOS 3.0 from 0x00006cd9 to 0x000f5d00
WARNING - Unable to allocate resource at romfile_loader_allocate:87!
WARNING - internal error detected at romfile_loader_add_checksum:152!
WARNING - internal error detected at romfile_loader_add_pointer:129!
WARNING - internal error detected at romfile_loader_add_pointer:129!
WARNING - internal error detected at romfile_loader_add_pointer:129!
WARNING - internal error detected at romfile_loader_add_checksum:152!
WARNING - internal error detected at romfile_loader_add_checksum:152!
WARNING - internal error detected at romfile_loader_add_checksum:152!
WARNING - internal error detected at romfile_loader_add_checksum:152!
WARNING - internal error detected at romfile_loader_add_checksum:152!
WARNING - internal error detected at romfile_loader_add_pointer:129!
WARNING - internal error detected at romfile_loader_add_pointer:129!
WARNING - internal error detected at romfile_loader_add_pointer:129!
WARNING - internal error detected at romfile_loader_add_pointer:129!
WARNING - internal error detected at romfile_loader_add_pointer:129!
WARNING - internal error detected at romfile_loader_add_checksum:152!
WARNING - internal error detected at romfile_loader_add_pointer:129!

virt-install \
--name demo_seabios \
--disk /tmp/demo_seabios.qcow2 \
--import \
--noreboot \
--console pty,target_type=serial \
--graphics vnc \
--video vga \
--cpu host-passthrough \
--network bridge=virbr0 \
--os-type=linux \
--os-variant=rhel8.1 \
--features apic=on,apic.eoi=on,pae=on \
--boot loader='/tmp/seabios.bin' \
--qemu-commandline='-global isa-debugcon.iobase=0x402 -debugcon file:/tmp/demo_test.ovmf.log ' \
--qemu-commandline='-machine smbios-ep=3-0' \
--qemu-commandline='-global mch.extended-tseg-mbytes=128' \
--cpu host-passthrough \
--vcpus 255 \
--memory 8390356

Output from commands inside the demo_seabios vm:

[root@ah-071 ~]# ssh root.122.191
sign_and_send_pubkey: no mutual signature supported
root.122.191's password:
Web console: https://fedora:9090/ or https://192.168.122.191:9090/

Last login: Wed Aug 25 22:19:30 2021 from 192.168.122.1
[root@fedora ~]# free -g
               total        used        free      shared  buff/cache   available
Mem:            8064          15        8048           0           0        8030
Swap:              7           0           7
[root@fedora ~]# dmidecode | head -15
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x7FFE3650.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
    Vendor: SeaBIOS
    Version: rel-1.14.0-45-g7639f071
    Release Date: 04/01/2014
    Address: 0xE8000
    Runtime Size: 96 kB
    ROM Size: 64 kB
    Characteristics:
        BIOS characteristics not supported
[root@fedora ~]#  dmidecode --dump-bin 960_smbios.bin
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x7FFE3650.

# Writing 117155 bytes to 960_smbios.bin.
# Writing 24 bytes to 960_smbios.bin.
[root@fedora ~]# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          960
On-line CPU(s) list:             0-959
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       960
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz
Stepping:                        6
CPU MHz:                         2400.236
BogoMIPS:                        4800.47
Virtualization:                  VT-x
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       30 MiB
L1i cache:                       30 MiB
L2 cache:                        3.8 GiB
L3 cache:                        15 GiB
NUMA node0 CPU(s):               0-959
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; TSX disabled
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arc
                                 h_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_t
                                 imer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi
                                 flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd av
                                 x512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni md_clear arch_capabilities

We have more time today to isolate

Comment 65 Brian Payton 2021-08-26 14:45:37 UTC
Created attachment 1817971 [details]
Successful debugcon file for Comment 64

Comment 66 Brian Payton 2021-08-26 14:46:33 UTC
Created attachment 1817972 [details]
Failing debugcon file for Comment 64

Comment 67 Brian Payton 2021-08-26 14:47:14 UTC
Created attachment 1817973 [details]
xml file for Comment 64

Comment 68 Brian Payton 2021-08-26 14:55:50 UTC
Hello Eduardo,

      The last test session started with 12TB and 1024 vcpus which failed for both uefi and seabios virtual machines.  Further resolution found 8TB and 960 vcpus worked for the seabios image but the uefi vm failed with the following in the debugcon log file.  We turned our focus to increasing the seabios limits and only found 8TB and 2vcpus to work with the uefi vm with all failures generating the same error in the debugcon log file.  We will focus on the specific uefi limits during the next test session.

ASSERT_EFI_ERROR (Status = Out of Resources)
ASSERT /builddir/build/BUILD/edk2-e1999b264f1f/UefiCpuPkg/CpuMpPei/CpuBist.c(186): !EFI_ERROR (Status)

      Any comments?

Regards,

Brian

Comment 69 Eduardo Habkost 2021-08-26 15:16:33 UTC
(In reply to Brian Payton from comment #68)
> ASSERT_EFI_ERROR (Status = Out of Resources)
> ASSERT
> /builddir/build/BUILD/edk2-e1999b264f1f/UefiCpuPkg/CpuMpPei/CpuBist.c(186):
> !EFI_ERROR (Status)
> 
>       Any comments?

This looks like the bug we are already tracking at bug 1982176.

The SeaBIOS failures look new (so I guess that's good news!), and we need to investigate further.

Comment 70 Eduardo Habkost 2021-08-26 15:30:36 UTC
(In reply to Brian Payton from comment #64)
> Copying SMBIOS 3.0 from 0x00006cd9 to 0x000f5d00
> WARNING - Unable to allocate resource at romfile_loader_allocate:87!

Maybe SeaBIOS has an internal memory allocation size limit similar to the edk2 limitations tracked at bug 1982176.

Note that the TSEG size setting affects only UEFI (AFAIK), so we probably need something different to make larger SMBIOS tables work with SeaBIOS.

In the meantime, I would run experiments to see if the VCPU limit is higher if the RAM size is relatively small (with both edk2 and SeaBIOS).

Comment 71 John Ferlan 2021-09-09 12:07:54 UTC
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 72 Brian Payton 2021-09-16 00:23:19 UTC
We found the following limits using RHEL 8.5, RHEL-AV 8.5 Alpha1, qemu-kvm-6.0.0-27.el8.ehabkost202108111546.x86_64 with SMBIOS 3.0 and a custom Seabios.bin to work with SMBIOS 3.0.

Legacy (Seabios)  960 vcpus    8TB memory
Legacy numa       768 vcpus   24TB memory   32 sockets
UEFI (OVMF)       960 vcpus    7TB memory
UEFI numa         768 vcpus    1TB memory   32 sockets

Any comments?

Have a good day.

Regards,

Brian

Comment 73 Brian Payton 2021-09-16 00:24:11 UTC
Created attachment 1823459 [details]
package list for Comment 72

Comment 74 Brian Payton 2021-09-16 00:45:49 UTC
Created attachment 1823460 [details]
Legacy numa xml for Comment 72

Comment 75 Brian Payton 2021-09-16 00:46:45 UTC
Created attachment 1823461 [details]
Legacy xml for Comment 72

Comment 76 Brian Payton 2021-09-16 00:47:43 UTC
Created attachment 1823462 [details]
UEFI numa xml for Comment 72

Comment 77 Brian Payton 2021-09-16 00:48:23 UTC
Created attachment 1823463 [details]
UEFI xml for Comment 72

Comment 78 Brian Payton 2021-09-16 00:52:25 UTC
Created attachment 1823464 [details]
Spreadsheet for Comment 72

Comment 79 Brian Payton 2021-09-16 00:55:10 UTC
Created attachment 1823465 [details]
Debug log for spreadsheet row 3 for Comment 72

Comment 80 Brian Payton 2021-09-16 01:02:50 UTC
Created attachment 1823466 [details]
Debug log for spreadsheet row 5 for Comment 72

Comment 81 Eduardo Habkost 2021-11-10 22:07:16 UTC
Moving back to virt-maint.  The upstream patches for this (not merged yet) are available at:
https://lore.kernel.org/qemu-devel/20211026151100.1691925-1-ehabkost@redhat.com/

Comment 82 Eduardo Habkost 2021-11-16 21:50:33 UTC
(In reply to Eduardo Habkost from comment #51)
> I will open a separate BZ for that specific issue.  The remaining cases (<=
> 8 TB VMs) seem to be due to small TSEG size.

Bug reported at https://bugzilla.redhat.com/show_bug.cgi?id=2023977

Comment 85 Igor Mammedov 2022-01-12 13:24:03 UTC
QEMU patches are merged upstream:
 10be11d0b48  smbios: Rename SMBIOS_ENTRY_POINT_* enums
 bdf54a9a7bd  hw/smbios: Use qapi for SmbiosEntryPointType
 0e4edb3b3b5  hw/i386: expose a "smbios-entry-point-type" PC machine property

Comment 89 Nitesh Narayan Lal 2022-02-05 00:06:05 UTC
Increasing the priority to match with BZ#1906077, since as the next step we would like to get all the bits associated with the Support SMBIOS 3.0 Entry Point merged.
Also, setting the target release to 9.1.

Comment 91 Nitesh Narayan Lal 2022-04-20 21:06:35 UTC
Moving the BZ to POST so that the patches can come via rebase.
Igor has already shared the commits in comment 85, so clearing his needinfo.

Comment 94 Yanan Fu 2022-04-25 12:36:39 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 98 Xueqiang Wei 2022-04-28 14:41:29 UTC
According to Comment 4, tested with qemu-kvm-7.0.0-1.el9, It wroks as expected. So set status to VERIFIED. If I was wrong, please correct me. Thanks.


Versions:
kernel-5.14.0-78.el9.x86_64
qemu-kvm-7.0.0-1.el9
edk2-ovmf-20220221gitb24306f15d-1.el9.noarch
seabios-bin-1.16.0-1.el9.noarch


1. boot a guest with seabios. (don't add smbios-entry-point-type=64 into qemu command lines)

2. boot a guest with seabios. (add smbios-entry-point-type=64 into qemu command lines)

# cat debug.sh
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35,memory-backend=mem-machine_mem,smbios-entry-point-type=64 \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 29696 \
    -object memory-backend-ram,size=29696M,id=mem-machine_mem  \
    -smp 64,maxcpus=64,cores=16,threads=2,dies=1,sockets=2  \
    -cpu 'Skylake-Server-IBRS',+kvm_pv_unhalt \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/avocado_ti4stbke/monitor-qmpmonitor1-20220412-052832-OLNnOWJ3,server=on,wait=off  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,id=qmp_id_catch_monitor,path=/tmp/avocado_ti4stbke/monitor-catch_monitor-20220412-052832-OLNnOWJ3,server=on,wait=off  \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id5VPWWY \
    -chardev socket,id=chardev_serial0,path=/tmp/avocado_ti4stbke/serial-serial0-20220412-052832-OLNnOWJ3,server=on,wait=off \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20220412-052832-OLNnOWJ3,path=/tmp/avocado_ti4stbke/seabios-20220412-052832-OLNnOWJ3,server=on,wait=off \
    -device isa-debugcon,chardev=seabioslog_id_20220412-052832-OLNnOWJ3,iobase=0x402 \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/rhel910-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:83:04:f4:b5:1d,id=id9CnYs6,netdev=idNXvszc,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idNXvszc,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \

3. boot a guest with edk2. (don't add smbios-entry-point-type=64 into qemu command lines)

4. boot a guest with edk2. (add smbios-entry-point-type=64 into qemu command lines)



After step 1, guest boot up successfully, check smbios versoin in guest.

# dmidecode

the output like:
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.
13 structures occupying 689 bytes.
Table at 0x7FFFFD40.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: SeaBIOS
        Version: 1.16.0-1.el9
        Release Date: 04/01/2014
        Address: 0xE8000
        Runtime Size: 96 kB
        ROM Size: 64 kB
        Characteristics:
                BIOS characteristics not supported
                Targeted content distribution is supported
        BIOS Revision: 0.0
......
......
Handle 0x2000, DMI type 32, 11 bytes
System Boot Information
        Status: No errors detected

Handle 0x7F00, DMI type 127, 4 bytes
End Of Table


After step 2, guest boot up successfully, check smbios versoin in guest.
# dmidecode

the output like:
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x7FFFFD40.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: SeaBIOS
        Version: 1.16.0-1.el9
        Release Date: 04/01/2014
        Address: 0xE8000
        Runtime Size: 96 kB
        ROM Size: 64 kB
        Characteristics:
                BIOS characteristics not supported
                Targeted content distribution is supported
        BIOS Revision: 0.0
......
......
Handle 0x2000, DMI type 32, 11 bytes
System Boot Information
        Status: No errors detected

Handle 0x7F00, DMI type 127, 4 bytes
End Of Table


After step 3, the result is similar with step 1.
After step 4, the result is similar with step 2.



Hi Chensheng,

I found that you added the case VIRT-8962 link in the bug, just a reminder, I think you should test it with smbios 3.0 (qemu-kvm -M q35,smbios-entry-point-type=64). Thanks.

Comment 99 Chensheng Dong 2022-08-05 07:00:36 UTC
Hi xueqiang,

Had covered with smbios 3.0, thanks for you remind.

Comment 100 Chensheng Dong 2022-08-05 07:10:47 UTC
BTW, our host only have 448 vcpu and 8T memory

Comment 102 errata-xmlrpc 2022-11-15 09:53:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7967


Note You need to log in before you can comment on or make changes to this bug.