Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1861718

Summary:	Very slow boot when overcommitting CPU
Product:	Red Hat Enterprise Linux 8	Reporter:	Eduardo Habkost <ehabkost>
Component:	edk2	Assignee:	Laszlo Ersek <lersek>
Status:	CLOSED ERRATA	QA Contact:	leidwang <leidwang>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	8.3	CC:	berrange, coli, jinzhao, juzhang, kraxel, leidwang, lersek, mtessun, pbonzini, philmd, virt-maint, xuwei, yuhuang
Target Milestone:	rc	Keywords:	Triaged
Target Release:	8.3	Flags:	pm-rhel: mirror+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	edk2-20200602gitca407c7246bf-3.el8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-11-04 04:01:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1788991

Description Eduardo Habkost 2020-07-29 11:49:53 UTC

Description of problem:

When doing testing of VCPU limits in qemu-kvm, I found out that booting gets very slow if the number of VCPUS is higher than host CPU count.

If booting with isa-debugcon, I see thousands of messages like:

  ConvertPageEntryAttribute 0x8000000050E000E7->0x8000000050E000E6

at a very slow speed


Version-Release number of selected component (if applicable):
qemu-kvm-5.0.0-2.module+el8.3.0+7379+0505d6ca.x86_64
edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch


How reproducible:
Always.

Steps to Reproduce:
In a host with less than 384 CPUs, run:

/usr/libexec/qemu-kvm -machine q35,accel=kvm,kernel-irqchip=split -smp 384 -drive if=pflash,format=raw,readonly,file=./OVMF_CODE.secboot.fd -drive if=pflash,format=raw,file=./OVMF_VARS.fd -device intel-iommu,intremap=on,eim=on  -m 4096 -drive if=virtio,file=/root/rhel-guest-image-8.3-266.x86_64.qcow2,format=qcow2 -vnc :0 -cdrom /root/seed.iso  -display none -serial stdio -boot menu=on -chardev file,id=fw-debug,path=/tmp/DOMAIN_NAME.fw.log -device isa-debugcon,iobase=0x402,chardev=fw-debug


Actual results:
VM takes a long time to boot, using 100% of all host CPUs most of the time.

In a host with 448 CPUs running a 512 VCPU VM:
* VM takes 20-30 minutes to boot
* 0.72 seconds between each ConvertPageEntryAttribute message, on average

In a host with 48 CPUs running a 384 VCPU VM:
* VM doesn't boot after 3 hours
* 30 seconds between each ConvertPageEntryAttribute message


Expected results:
VM taking a more reasonable time to boot (less than 5 minutes to reach grub, preferably).


Additional info:
Laszlo analysis from email thread:

>> At a point during the boot process, Platform BDS signals "SMM ready to
>> lock". This is kind of a marker for the firmware before which only
>> firmware modules from the platform vendor run, but after which 3rd party
>> UEFI modules (such as boot loaders, PCI card option ROMs) will run. So
>> the platform firmware performs various lock-down operations.
>>
>> One of those is that, at the first SMI following the above signal, the
>> SMI handler that runs on the BSP will unmap (remove the present bit) in
>> the SMM page table entries, on most pages (except those UEFI memory
>> types that either are acceptable for SMM communication buffers, or map
>> MMIO). So the above bunch of messages report clearing the present bit,
>> from the SetUefiMemMapAttributes() function
>> [UefiCpuPkg/PiSmmCpuDxeSmm/SmmCpuMemoryManagement.c].
>>
>> Meanwhile the APs are busy-waiting in SMM for the BSP to finish.
>>
>> With VCPU overcommit, the APs could hinder the BSP's progress. And in
>> this particular AP loop, in  I don't see a CpuPause() call.

Comment 2 Laszlo Ersek 2020-07-29 18:53:14 UTC

Posted upstream patch:

[edk2-devel] [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch
http://mid.mail-archive.com/20200729185217.10084-1-lersek@redhat.com
https://edk2.groups.io/g/devel/message/63454

Comment 3 Laszlo Ersek 2020-07-30 10:26:37 UTC

Testing feedback from Eduardo, using the patch:

(In reply to Eduardo Habkost from comment #0)

> In a host with 448 CPUs running a 512 VCPU VM:
> * VM takes 20-30 minutes to boot
> * 0.72 seconds between each ConvertPageEntryAttribute message, on average

with the patch: ~4 minutes to grub

> In a host with 48 CPUs running a 384 VCPU VM:
> * VM doesn't boot after 3 hours
> * 30 seconds between each ConvertPageEntryAttribute message

with the patch: 14 minutes to grub

Comment 4 Laszlo Ersek 2020-07-31 13:30:29 UTC

(In reply to Laszlo Ersek from comment #2)
> Posted upstream patch:
> 
> [edk2-devel] [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore()
> before re-fetch
> http://mid.mail-archive.com/20200729185217.10084-1-lersek@redhat.com
> https://edk2.groups.io/g/devel/message/63454

Merged upstream as commit 9001b750df64, via <https://github.com/tianocore/edk2/pull/843>.

Comment 6 leidwang@redhat.com 2020-08-05 10:38:07 UTC

Hi Laszlo,

I have a question about this bz, does edk2 support vcpu overcommit?

Many thanks!

Comment 7 CongLi 2020-08-05 10:45:56 UTC

(In reply to leidwang from comment #6)
> Hi Laszlo,
> 
> I have a question about this bz, does edk2 support vcpu overcommit?
> 
> Many thanks!

Sorry, let me clarify.
For cpu overcommit, we usually support 1:1 cpu overcommit, does edk2 support such high overcommit in one vm (48:384) ?

Comment 8 Laszlo Ersek 2020-08-06 01:53:18 UTC

(In reply to CongLi from comment #7)
> (In reply to leidwang from comment #6)
> > Hi Laszlo,
> > 
> > I have a question about this bz, does edk2 support vcpu overcommit?
> > 
> > Many thanks!
> 
> Sorry, let me clarify.
> For cpu overcommit, we usually support 1:1 cpu overcommit, does edk2 support
> such high overcommit in one vm (48:384) ?

I don't understand what you mean by "1:1 cpu overcommit". If you mean 1 VCPU per 1 PCPU, that's not "over"commit. Overcommit is when you have more VCPUs (summed over all VMs running on a host) than the host has PCPUs.

Anyway, I would advise against using OVMF in any overcommit scenario; the numbers seen in this BZ come from Eduardo's work towards higher VCPU counts. 448->512 seems like a reasonable use case (that was affected by the OVMF issue). 48->384 is intentionally amplifying the issue, for illustration, it's not reasonable for production.

So I'd say stick with whatever overcommit standards you've been using thus far, unless Eduardo has particular overcommit requests that are required for testing his work. Thanks.

Comment 9 Laszlo Ersek 2020-08-06 01:59:30 UTC

For testing this particular bugfix, I'd suggest a slight overcommit scenario, a ratio similar to Eduardo's 512/448 (~1.14). Maybe up to 1.5 if you have a low PCPU count on the host, such as 4 or 8.

Comment 10 John Ferlan 2020-08-06 17:57:58 UTC

Can we get a qa_ack+ please?

Comment 11 leidwang@redhat.com 2020-08-07 03:44:26 UTC

(In reply to Laszlo Ersek from comment #9)
> For testing this particular bugfix, I'd suggest a slight overcommit
> scenario, a ratio similar to Eduardo's 512/448 (~1.14). Maybe up to 1.5 if
> you have a low PCPU count on the host, such as 4 or 8.

Test this bz in a host with 40 PCPUs.

Results are as bellow:
 
* 40  vcpus             VM works well
* 40  vcpus + 40 vcpus  VM all works well
* 80  vcpus             VM works well
* 160 vcpus             VM takes 7 minutes to boot
* 200 vcpus             VM takes 15 minutes to boot
* 240 vcpus             VM takes 28 minutes to boot
* 384 vcpus             VM doesn't boot after 4 hours

A question about cpu overcommit,do we support 1 VM overcommit(1 vm have greate numbers of VCPU than host PCPU numbers)? Even slightly overcommit.

Thanks!

Comment 12 Laszlo Ersek 2020-08-07 14:29:51 UTC

Hello Leidong Wang,

(In reply to leidwang from comment #11)
> (In reply to Laszlo Ersek from comment #9)
> > For testing this particular bugfix, I'd suggest a slight overcommit
> > scenario, a ratio similar to Eduardo's 512/448 (~1.14). Maybe up to 1.5 if
> > you have a low PCPU count on the host, such as 4 or 8.
> 
> Test this bz in a host with 40 PCPUs.
> 
> Results are as bellow:
>  
> * 40  vcpus             VM works well
> * 40  vcpus + 40 vcpus  VM all works well
> * 80  vcpus             VM works well

First question: what is the difference between "40+40" and "80"?

Does "40+40" mean two VMs, each with 40 VCPUs?

And does "80" mean one VM with 80 VCPUs?

> * 160 vcpus             VM takes 7 minutes to boot
> * 200 vcpus             VM takes 15 minutes to boot
> * 240 vcpus             VM takes 28 minutes to boot
> * 384 vcpus             VM doesn't boot after 4 hours

Because this patch is a performance optimization, what we should really be doing is *compare* the boot times between:
- edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch
- edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch [upcoming build, containing the patch]

So I understand your above measurements to be the baseline (i.e., boot times without the patch). Is that correct?

> A question about cpu overcommit,do we support 1 VM overcommit(1 vm have
> greate numbers of VCPU than host PCPU numbers)? Even slightly overcommit.

I'm curious too. I don't know what our official recommendations are about CPU overcommit.

Thanks.

Comment 13 Laszlo Ersek 2020-08-07 14:31:47 UTC

Anyway, I'd like to confirm that the before-after boot times should be compared using a single VM. Thanks!

Comment 14 leidwang@redhat.com 2020-08-10 01:20:34 UTC

(In reply to Laszlo Ersek from comment #12)
> Hello Leidong Wang,
> 
> (In reply to leidwang from comment #11)
> > (In reply to Laszlo Ersek from comment #9)
> > > For testing this particular bugfix, I'd suggest a slight overcommit
> > > scenario, a ratio similar to Eduardo's 512/448 (~1.14). Maybe up to 1.5 if
> > > you have a low PCPU count on the host, such as 4 or 8.
> > 
> > Test this bz in a host with 40 PCPUs.
> > 
> > Results are as bellow:
> >  
> > * 40  vcpus             VM works well
> > * 40  vcpus + 40 vcpus  VM all works well
> > * 80  vcpus             VM works well
> 
> First question: what is the difference between "40+40" and "80"?
> 
> Does "40+40" mean two VMs, each with 40 VCPUs?
> 
> And does "80" mean one VM with 80 VCPUs?

Yes,you are right.
> 
> > * 160 vcpus             VM takes 7 minutes to boot
> > * 200 vcpus             VM takes 15 minutes to boot
> > * 240 vcpus             VM takes 28 minutes to boot
> > * 384 vcpus             VM doesn't boot after 4 hours
> 
> Because this patch is a performance optimization, what we should really be
> doing is *compare* the boot times between:
> - edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch
> - edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch [upcoming build, containing
> the patch]
> 
> So I understand your above measurements to be the baseline (i.e., boot times
> without the patch). Is that correct?

Yes,this result is based on edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch, so I need test it with edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch?
> 
> > A question about cpu overcommit,do we support 1 VM overcommit(1 vm have
> > greate numbers of VCPU than host PCPU numbers)? Even slightly overcommit.
> 
> I'm curious too. I don't know what our official recommendations are about
> CPU overcommit.
> 
> Thanks.

OK,Thanks!

Comment 16 Laszlo Ersek 2020-08-10 09:12:09 UTC

(In reply to leidwang from comment #14)
> (In reply to Laszlo Ersek from comment #12)
> > (In reply to leidwang from comment #11)
> > > * 160 vcpus             VM takes 7 minutes to boot
> > > * 200 vcpus             VM takes 15 minutes to boot
> > > * 240 vcpus             VM takes 28 minutes to boot
> > > * 384 vcpus             VM doesn't boot after 4 hours
> > 
> > Because this patch is a performance optimization, what we should really be
> > doing is *compare* the boot times between:
> > - edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch
> > - edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch [upcoming build, containing
> > the patch]
> > 
> > So I understand your above measurements to be the baseline (i.e., boot times
> > without the patch). Is that correct?
> 
> Yes,this result is based on edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch,
> so I need test it with edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch?

That's right.

Please run the same tests, on the same host machine, with "edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch" and "edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch", and compare the boot times. Multi-VM tests are not relevant for now.

Thanks!

Comment 17 leidwang@redhat.com 2020-08-11 05:54:38 UTC

(In reply to Laszlo Ersek from comment #16)
> (In reply to leidwang from comment #14)
> > (In reply to Laszlo Ersek from comment #12)
> > > (In reply to leidwang from comment #11)
> > > > * 160 vcpus             VM takes 7 minutes to boot
> > > > * 200 vcpus             VM takes 15 minutes to boot
> > > > * 240 vcpus             VM takes 28 minutes to boot
> > > > * 384 vcpus             VM doesn't boot after 4 hours
> > > 
> > > Because this patch is a performance optimization, what we should really be
> > > doing is *compare* the boot times between:
> > > - edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch
> > > - edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch [upcoming build, containing
> > > the patch]
> > > 
> > > So I understand your above measurements to be the baseline (i.e., boot times
> > > without the patch). Is that correct?
> > 
> > Yes,this result is based on edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch,
> > so I need test it with edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch?
> 
> That's right.
> 
> Please run the same tests, on the same host machine, with
> "edk2-ovmf-20200602gitca407c7246bf-2.el8.noarch" and
> "edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch", and compare the boot
> times. Multi-VM tests are not relevant for now.
> 
> Thanks!

Retest it on the same host with "edk2-ovmf-20200602gitca407c7246bf-3.el8.noarch"

Results are as bellow:

* 40  vcpus             VM works well
* 80  vcpus             VM works well
* 160 vcpus             VM takes 1.5 minutes to boot
* 200 vcpus             VM takes 1.5 minutes to boot
* 240 vcpus             VM takes 2 minutes to boot
* 280 vcpus             VM takes 2.5 minutes to boot
* 320 vcpus             VM takes 3.5 minutes to boot
* 384 vcpus             VM takes 5 minutes to boot

Comment 18 Laszlo Ersek 2020-08-11 15:10:41 UTC

Thank you. So we have

VCPU count  -2.el8      -3.el8
----------  ----------  ----------
        40  ~immediate  ~immediate
        80  ~immediate  ~immediate
       160      7 mins    1.5 mins
       200     15 mins    1.5 mins
       240     28 mins    2.0 mins
       384   >240 mins    5.0 mins

Please set the BZ status to VERIFIED. Thanks!

Comment 22 Eduardo Habkost 2020-08-24 22:22:10 UTC

(In reply to leidwang from comment #11)
> A question about cpu overcommit,do we support 1 VM overcommit(1 vm have
> greate numbers of VCPU than host PCPU numbers)? Even slightly overcommit.
> 
> Thanks!

Pasting reply sent by email last week:

I don't know the answer to those questions, I hope Martin can
help you.  That BZ has overcommit involved because it was the
best way to reproduce the OVMF performance issue, not because
it's an important use case.

Comment 24 Martin Tessun 2020-10-22 10:06:50 UTC

(In reply to Eduardo Habkost from comment #22)
> (In reply to leidwang from comment #11)
> > A question about cpu overcommit,do we support 1 VM overcommit(1 vm have
> > greate numbers of VCPU than host PCPU numbers)? Even slightly overcommit.
> > 
> > Thanks!
> 
> Pasting reply sent by email last week:
> 
> I don't know the answer to those questions, I hope Martin can
> help you.  That BZ has overcommit involved because it was the
> best way to reproduce the OVMF performance issue, not because
> it's an important use case.

I think I answered in that thread, but as well here:
No, we do not support CPU overcommit on a per VM basis.

In detail:
#vCPUs <= #pCPUs (including threads!)

Of course you can do overcommit with multiple VMs, so

sum(#vCPUs) maybe greater than #pCPUs

Comment 26 errata-xmlrpc 2020-11-04 04:01:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: edk2 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4805