Bug 1946231 - [RFE] Support virtual machines with 710 VCPUs
Summary: [RFE] Support virtual machines with 710 VCPUs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.4.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.4.8
: ---
Assignee: Milan Zamazal
QA Contact: Qin Yuan
URL:
Whiteboard:
Depends On: 1840923
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-05 13:15 UTC by Arik
Modified: 2021-08-19 06:23 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.4.8.4
Doc Type: Enhancement
Doc Text:
The maximum number of vCPUs has been increased to 710 on x86_64 architecture and 4.6 cluster level. Additionally, the limit on the number of CPU sockets has been effectively removed for 4.6 cluster levels.
Clone Of:
Environment:
Last Closed: 2021-08-19 06:23:15 UTC
oVirt Team: Virt
Embargoed:
ahadas: ovirt-4.4?
pm-rhel: planning_ack?
pm-rhel: devel_ack+
pm-rhel: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 114119 0 master MERGED core: increase max vcpu limit to 710 2021-06-18 13:23:50 UTC
oVirt gerrit 115649 0 master MERGED core: Add support for 256 and more x86 vCPUs 2021-07-18 11:13:45 UTC
oVirt gerrit 115725 0 master MERGED core: Current VCPUs mustn't exceed max VCPUs 2021-07-28 14:34:41 UTC
oVirt gerrit 115916 0 master MERGED core: Use VM, not cluster, compatibility version for max VCPUs 2021-07-27 20:32:15 UTC
oVirt gerrit 115977 0 master ABANDONED core: Remove the limit on CPU sockets 2021-08-05 17:39:07 UTC

Internal Links: 1984464

Comment 3 Arik 2021-06-17 20:21:13 UTC
This has no implication on the NUMA-pinning dialog but for the record I tested the NUMA-pinning dialog with 700 vNUMA nodes and it works fine

Comment 4 Arik 2021-06-17 20:33:57 UTC
I'll merge the patch that enables up to 710 vCPUs but it's not a complete because without relaxing the restrictions on the supported CPU topology we can't set a 710:1:1 topology - 
First, the number of sockets is limited to 16
Second, there is a computation in VmCpuCountHelper#calcMaxVCpu that seems to prevent maxVCpus from getting to 710 with this combination

Milan, I see that you've added the latter, should it change?

Comment 5 Milan Zamazal 2021-06-18 09:41:48 UTC
The most comprehensive summary at the time was https://bugzilla.redhat.com/show_bug.cgi?id=1406243#c13. Basically, Intel 8-bit APIC ID limits applied (on x86_64). A newer, x2APIC support was needed to lift the restriction.
I'd suggest checking with the platform whether enabling the higher number of vCPUs requires some hardware/emulation features, such as `x2apic' CPU flag, or whether it's automatically possible on hosts that provide that high number of CPUs. Additionally (or alternatively?), looking into BZ 1788991, maybe adding <ioapic driver='qemu'/> to the domain XML is needed?

Comment 6 Arik 2021-06-20 13:56:45 UTC
(In reply to Milan Zamazal from comment #5)
> The most comprehensive summary at the time was
> https://bugzilla.redhat.com/show_bug.cgi?id=1406243#c13. Basically, Intel
> 8-bit APIC ID limits applied (on x86_64). A newer, x2APIC support was needed
> to lift the restriction.
> I'd suggest checking with the platform whether enabling the higher number of
> vCPUs requires some hardware/emulation features, such as `x2apic' CPU flag,
> or whether it's automatically possible on hosts that provide that high
> number of CPUs. Additionally (or alternatively?), looking into BZ 1788991,
> maybe adding <ioapic driver='qemu'/> to the domain XML is needed?

Yeah I also see that <apic/> and <ioapic driver='qemu'/> appear in https://bugzilla.redhat.com/show_bug.cgi?id=1840923#c41
If that's what needed to enable this and that's what QEMU test with, maybe we should do it as well
So Milan, can you please check that? I took this bug since I thought it's a simple change of a configuration value :)

Comment 10 Arik 2021-07-28 16:29:16 UTC
Milan, is there anything else to change? did we drop the limitation on 16 sockets?

Comment 11 Milan Zamazal 2021-07-28 17:18:18 UTC
The limitation on 16 sockets hasn't been dropped yet. Technically, it's not necessarily needed for this RFE (it can be compensated with the number of cores) but it would be nice to have and I plan to look at it. Another enhancement may be to improve immediate checks in the VM edit dialog but it's also not strictly necessary.

Comment 12 Arik 2021-07-29 08:11:22 UTC
(In reply to Milan Zamazal from comment #11)
> The limitation on 16 sockets hasn't been dropped yet. Technically, it's not
> necessarily needed for this RFE (it can be compensated with the number of
> cores) but it would be nice to have and I plan to look at it. Another
> enhancement may be to improve immediate checks in the VM edit dialog but
> it's also not strictly necessary.

Ack, thanks
So lets keep this bz in POST for now so if we'll do that, QE will test all those changes together (and we won't need to mess with many bugs)
When we're ok with the changes that are in for 4.4.8, we'll move it to MODIFIED

Comment 13 Milan Zamazal 2021-07-29 14:39:21 UTC
It seems VMs with hundreds of sockets can start and I couldn't find any information about limits on the number of CPU sockets, so perhaps we can drop the limit completely.

Eduardo, do you know whether it's safe not to limit the number of CPU sockets and allow configurations such as 710 sockets with 1 die, core and thread per socket?

Comment 14 Eduardo Habkost 2021-07-29 16:21:30 UTC
(In reply to Milan Zamazal from comment #13)
> It seems VMs with hundreds of sockets can start and I couldn't find any
> information about limits on the number of CPU sockets, so perhaps we can
> drop the limit completely.
> 
> Eduardo, do you know whether it's safe not to limit the number of CPU
> sockets and allow configurations such as 710 sockets with 1 die, core and
> thread per socket?

We probably won't hit any host-side limits if we do this.  I think it makes sense to remove the socket count limitation from management software, but we should keep in mind that this doesn't mean configurations that have no equivalent in real hardware are supposed to work out of the box with all guests.

Comment 15 Milan Zamazal 2021-08-16 14:19:07 UTC
Since I didn't have hardware with that many CPUs, I verified the attached patches by faking the number of CPUs in Vdsm, like this: https://gerrit.ovirt.org/c/vdsm/+/116206. Although running a strongly overcommitted VM doesn't allow its guest OS to boot up normally, it's good enough for checking whether the VM can be started by QEMU at all with the given number of current/maximum vCPUs and CPU topology.

Comment 16 Qin Yuan 2021-08-17 08:17:00 UTC
QE doesn't have hardware to verify this bug. 
Move this bug to VERIFIED according to Milan's comment #15.

Comment 18 Sandro Bonazzola 2021-08-19 06:23:15 UTC
This bugzilla is included in oVirt 4.4.8 release, published on August 19th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.8 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.