RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2094270 - Do not set the hard vCPU limit to the soft vCPU limit in downstream qemu-kvm anymore
Summary: Do not set the hard vCPU limit to the soft vCPU limit in downstream qemu-kvm ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 9.1
Assignee: Vitaly Kuznetsov
QA Contact: liunana
Jiri Herrmann
URL:
Whiteboard:
Depends On:
Blocks: 2095260
TreeView+ depends on / blocked
 
Reported: 2022-06-07 10:22 UTC by Thomas Huth
Modified: 2022-11-15 10:20 UTC (History)
9 users (show)

Fixed In Version: qemu-kvm-7.0.0-6.el9
Doc Type: Known Issue
Doc Text:
Cause: Due to the rebase of the KVM code via Bugzilla #2074832 the kernel now recommends a lower amount of virtual CPUs for KVM guests (which matches the amount of available physical CPUs instead of an arbitrary high value). Consequence: qemu-kvm will now warn if the user tries to run a guest that has more CPUs configured that are available as physical CPUs in the host. Workaround (if any): Do not use more vCPUs for the guest than the amount of physical CPUs that are available on the host.
Clone Of:
: 2095260 (view as bug list)
Environment:
Last Closed: 2022-11-15 09:54:42 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src qemu-kvm merge_requests 99 0 None opened Revert "globally limit the maximum number of CPUs" 2022-06-09 10:18:42 UTC
Red Hat Bugzilla 2074832 1 high CLOSED Rebase KVM x86 to upstream 5.18 2023-10-08 06:15:16 UTC
Red Hat Issue Tracker RHELPLAN-124451 0 None None None 2022-06-07 10:39:38 UTC
Red Hat Product Errata RHSA-2022:7967 0 None None None 2022-11-15 09:55:24 UTC

Description Thomas Huth 2022-06-07 10:22:28 UTC
Description of problem:
In downstream qemu-kvm, we have a patch ("globally limit the maximum number of CPUs") that sets the hard limit of possible vCPUs to the value that the KVM code of the kernel recommends as soft limit. This soft limit was set to a value that we've tested in our downstream RHEL releases, so it make sense to set the hard limit to the same value. However, in upstream, the code has been changed recently to not use an arbitrary soft limit here anymore, but to cap the value on the amount of available physical CPUs of the host. So if that patch gets backported to the downstream kernel (see BZ 2074832), the hack in qemu-kvm won't work as expected anymore, making it impossible to set a "-smp x" value for the guests where x is greater than the amount of available physical CPUs.

Version-Release number of selected component (if applicable):
qemu-kvm-7.0.0-4.el9

How reproducible:
100%

Steps to Reproduce:
1. Install an upstream kernel (or the one from BZ 2074832)
2. Run a guest with more vCPUs than available physical host CPUs, e.g.:
   /usr/libexec/qemu-kvm -smp 700

Actual results:
qemu-kvm will refuse to start.

Expected results:
qemu-kvm should still run the guest.

Additional info:
I think we should simply revert/drop the "globally limit the maximum number of CPUs" patch in downstream qemu-kvm now.

Comment 2 Chao Yang 2022-06-09 02:28:43 UTC
Hi Thomas,

Do we need a counterpart libvirt bz in case we try migrating a VM with vCPU number exceeds the pCPU number of the destination host?

Comment 3 Thomas Huth 2022-06-09 07:04:58 UTC
(In reply to Chao Yang from comment #2)
> Do we need a counterpart libvirt bz in case we try migrating a VM with vCPU
> number exceeds the pCPU number of the destination host?

 Hi Chao,

I don't think we need a libvirt counterpart here. The problem only occurs if running the current qemu-kvm with an updated / upstream kernel - and libvirt is not involved in the limit checking here, as far as I know.

Comment 4 Jiri Denemark 2022-06-09 11:16:54 UTC
I think we actually need libvirt bz to revert
https://gitlab.com/redhat/rhel/src/libvirt/-/commit/fcec98bb80633bec6f4bc3de0ab75627c874d315
RHEL-only patch. We are not involved in limit checking, but we report the
maximum number of virtual CPUs.

Comment 5 Thomas Huth 2022-06-09 11:30:18 UTC
Today I learnt ... ok, I'll clone the BZ for libvirt, too.

Comment 7 Yanan Fu 2022-06-13 09:55:18 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 11 liunana 2022-06-16 14:55:50 UTC
Hi Vitaly,

Reproduce this bug with qemu-kvm-7.0.0-1.el9.x86_64 & kernel-5.14.0-105.el9.x86_64.
steps:
1. Check host cpus:
    CPU(s):                  32
      On-line CPU(s) list:   0-31

2. Boot qemu, I didn't reproduce the issue that qemu refues to start.        < ====
# /usr/libexec/qemu-kvm -M q35 -smp 710
qemu-kvm: warning: ACPI table size 91976 exceeds 65536 bytes, migration may not work
Try removing CPUs, NUMA nodes, memory slots or PCI bridges.VNC server running on ::1:5900
qemu-kvm: warning: ACPI table size 92150 exceeds 65536 bytes, migration may not work
Try removing CPUs, NUMA nodes, memory slots or PCI bridges.

3. QEMU refues to start with -smp 720.
# /usr/libexec/qemu-kvm -M q35 -smp 720
qemu-kvm: Invalid SMP CPUs 720. The max CPUs supported by machine 'pc-q35-rhel9.0.0' is 710



Verify this bug with qemu-kvm-7.0.0-6.el9.x86_64 & 5.14.0-111.el9.x86_64.
Steps:
1. Check host cpus:
    CPU(s):                  32
      On-line CPU(s) list:   0-31

2. Boot qemu.
# /usr/libexec/qemu-kvm -M q35 -smp 710
qemu-kvm: warning: Number of SMP cpus requested (710) exceeds the recommended cpus supported by KVM (32)
qemu-kvm: warning: Number of hotpluggable cpus requested (710) exceeds the recommended cpus supported by KVM (32)
qemu-kvm: warning: ACPI table size 91976 exceeds 65536 bytes, migration may not work
Try removing CPUs, NUMA nodes, memory slots or PCI bridges.VNC server running on ::1:5901
qemu-kvm: warning: ACPI table size 92150 exceeds 65536 bytes, migration may not work
Try removing CPUs, NUMA nodes, memory slots or PCI bridges.

3. QEMU refues to start with -smp 720.
# /usr/libexec/qemu-kvm -M q35 -smp 720
qemu-kvm: Invalid SMP CPUs 720. The max CPUs supported by machine 'pc-q35-rhel9.0.0' is 710



Please help to check if this is expected results. Thanks.


Best regards
Liu Nana

Comment 12 Vitaly Kuznetsov 2022-06-16 15:13:13 UTC
'710' is a different limitation imposed by QEMU, we're not changing it here.

To reproduce the problem we fix here, you need:
1) Kernel with rebased KVM, e.g. 5.14.0-111.el9.x86_64
2) Unfixed QEMU, e.g. qemu-kvm-7.0.0-1.el9.x86_64

In your case you won't be able to start a VM with more than 32 vCPUs, not 710.

To test that the issue is fixed you upgrade QEMU to qemu-kvm-7.0.0-6.el9. You will be able
to create > 32 vCPUs.

Testing with pre-rebase KVM doesn't show the problem.

Comment 13 liunana 2022-06-16 15:36:00 UTC
(In reply to Vitaly Kuznetsov from comment #12)
> '710' is a different limitation imposed by QEMU, we're not changing it here.
> 
> To reproduce the problem we fix here, you need:
> 1) Kernel with rebased KVM, e.g. 5.14.0-111.el9.x86_64
> 2) Unfixed QEMU, e.g. qemu-kvm-7.0.0-1.el9.x86_64
> 
> In your case you won't be able to start a VM with more than 32 vCPUs, not
> 710.
> 
> To test that the issue is fixed you upgrade QEMU to qemu-kvm-7.0.0-6.el9.
> You will be able
> to create > 32 vCPUs.
> 
> Testing with pre-rebase KVM doesn't show the problem.

Yes, reproduce the issue with qemu-kvm-7.0.0-1.el9.x86_64 & 5.14.0-111.el9.x86_64. Thanks.
# /usr/libexec/qemu-kvm -M q35 -smp 64
qemu-kvm: warning: Number of SMP cpus requested (64) exceeds the recommended cpus supported by KVM (32)
Number of SMP cpus requested (64) exceeds the maximum cpus supported by KVM (32)


And with fixed version qemu can start succssfully.
# /usr/libexec/qemu-kvm -M q35 -smp 64
qemu-kvm: warning: Number of SMP cpus requested (64) exceeds the recommended cpus supported by KVM (32)
qemu-kvm: warning: Number of hotpluggable cpus requested (64) exceeds the recommended cpus supported by KVM (32)
VNC server running on ::1:5900


And I have one question here.
According to the qemu warning with fixed version, does it mean that we don't plan to support vcpu overcommit any more?


Best regards
Liu Nana

Comment 14 Vitaly Kuznetsov 2022-06-16 15:44:50 UTC
(In reply to liunana from comment #13)

> And I have one question here.
> According to the qemu warning with fixed version, does it mean that we don't
> plan to support vcpu overcommit any more?

Note, this is 'self overcommit', when you create a single guest which has more vCPUs
than there are physical CPUs on the system. I don't think we're going to deny support
for such configurations but it should be clear that performance of such setups will
always suffer (and that's what the warning is about now).

Comment 15 liunana 2022-06-17 03:54:37 UTC
(In reply to Vitaly Kuznetsov from comment #14)
> (In reply to liunana from comment #13)
> 
> > And I have one question here.
> > According to the qemu warning with fixed version, does it mean that we don't
> > plan to support vcpu overcommit any more?
> 
> Note, this is 'self overcommit', when you create a single guest which has
> more vCPUs
> than there are physical CPUs on the system. I don't think we're going to
> deny support
> for such configurations but it should be clear that performance of such
> setups will
> always suffer (and that's what the warning is about now).

Thanks, move this bug to verified now.

Comment 22 Jiri Denemark 2022-08-04 16:49:07 UTC
Libvirt only reports the hard CPU limit (via virsh capabilities, virsh
domcapabilities, or virsh maxvcpus). The soft limit is not used or reported
anywhere, but the warning will be logged in /var/log/libvirt/qemu/$VM.log

Comment 25 errata-xmlrpc 2022-11-15 09:54:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7967


Note You need to log in before you can comment on or make changes to this bug.