Bug 2037734

Summary: Reflect different limits in CGroupV1/V2 for cpu.weight
Product: Red Hat Enterprise Linux 9 Reporter: Michal Privoznik <mprivozn>
Component: libvirtAssignee: Pavel Hrdina <phrdina>
libvirt sub component: General QA Contact: Meina Li <meili>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: jsuchane, lcong, lizhu, lmen, phrdina, smitterl, virt-maint, xuzhang, yisun
Version: unspecifiedKeywords: AutomationTriaged, Triaged
Target Milestone: rc   
Target Release: 9.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-9.1.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2035518 Environment:
Last Closed: 2023-11-07 08:30:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 9.1.0
Embargoed:

Description Michal Privoznik 2022-01-06 13:11:15 UTC
+++ This bug was initially created as a clone of Bug #2035518 +++

Description of problem:
As now in rhel9, cgroup V2 is enabled by default. All weights are in the range [1, 10000]. VM with the value of cpu.weight larger than 10000 can not be started, while this kind of VMs can be started on RHEL8 host which uses cgroup V1. Due to the same reason, migrating this kind of VMs from RHEL8 to RHEL9 will fail. 

Version-Release:
On Source host:
libvirt-7.6.0-6.module+el8.5.0+13051+7ddbe958.x86_64
qemu-kvm-6.0.0-33.module+el8.5.0+13041+05be2dc6.x86_64
On target host:
libvirt-7.10.0-1.el9.x86_64
qemu-kvm-6.2.0-1.el9.x86_64

1. check the guest xml
<domain type='kvm'>
  <name>VM</name>
...
 <cputune>
    <shares>10001</shares>
  </cputune>
...

2. try to migrate the guest from source host to target guest.
# virsh migrate VM qemu+ssh://$target/system --verbose --live
error: error from service: GDBus.Error:org.freedesktop.DBus.Error.InvalidArgs: Value specified in CPUWeight is out of range
(migration failed)

On RHEL, we can migrate this kind of VM successfully by specifying xml, e.g.,
3. dump the VM xml
# virsh dumpxml VM > VM.xml

4. change the shares value from 10001 to the value smaller than 10000,
# cat VM.xml
...
 <cputune>
    <shares>10000</shares>
  </cputune>
...

5. migrate the VM by specify the above xml
# virsh migrate VM qemu+ssh://$target/system --verbose --live --xml VM.xml
Migration: [100 %]

My question is:
1. Is there any problem with migrating this kind of VM on openstack?
2. If there is problem now, can the above workaround be accepted?
2. If the above workaround can be accepted, can the customers specify the xml, then migrate the guest on openstack now?

If there is no problem with migrating this kind of VM on openstack, or it can be solved by other method, feel free to close this bug.

--- Additional comment from Michal Privoznik on 2022-01-04 13:43:30 CET ---

Right, libvirt currently uses the same set of limits for both CGroupV1 and V2. That should be fixed on libvirt level. And for mgmt apps (like OpenStack) they need to provide an XML during migration with the values recalculated to fit into CGroupV2 limits.

Comment 1 John Ferlan 2022-01-11 11:59:09 UTC
Jarda - the ITR is set causing the Triaged flag to be set (silly RHEL bot action). 

Since ITR is set this should be assigned to someone or ITR cleared if there's no upstream patches yet.

Comment 2 Pavel Hrdina 2022-01-26 10:28:12 UTC
No upstream patches yet so removing ITR.

Comment 4 Meina Li 2023-07-05 03:19:59 UTC
Hi Pavel,

This bug can still be reproduced now. Do we have plan to fix it? If yes, we need to delay the "stale date". Please help to check it again. Thanks.

Reproduced Version:
source host:
libvirt-8.0.0-21.module+el8.9.0+19166+e262ca96.x86_64
qemu-kvm-6.2.0-36.module+el8.9.0+19222+f46ac890.x86_64


target host:
libvirt-9.5.0-0rc1.1.el9.x86_64
qemu-kvm-8.0.0-5.el9.x86_64

Reproduced Steps:
1. Prepare a guest xml with the following cputune.
  <cputune>
    <shares>10001</shares>
  </cputune>
2. Start the guest.
# virsh start rhel
Domain 'rhel' started
------We hope the guest can not be started as RHEL 9
3. Migrate the guest to RHEL 9.
# virsh migrate rhel qemu+ssh://dell-per750-39.lab.eng.pek2.redhat.com/system --live --verbose
root.eng.pek2.redhat.com's password: 
error: invalid argument: shares '10001' must be in range [1, 10000]

Comment 5 Pavel Hrdina 2023-07-13 10:15:12 UTC
Hi Meina,

Getting error when migrating from cgroup v1 host to cgroup v2 host where the cputune
shares value is out of cgroup v2 range is expected behavior. In libvirt we cannot simply
modify the value to fit within the new range as we would not be able to figure out what
new value to use while not breaking the expected behavior for users.

For the error message from libvirt that was fixed by the following commits:

commit cf3414a85b8383d71d6ae2a53daf63c331cc2230
Author: Pavel Hrdina <phrdina>
Date:   Tue Jan 17 10:02:07 2023 +0100

    vircgroupv2: fix cpu.weight limits check

commit 38af6497610075e5fe386734b87186731d4c17ac
Author: Pavel Hrdina <phrdina>
Date:   Tue Jan 17 10:08:08 2023 +0100

    domain_validate: drop cpu.shares cgroup check

commit ead6e1b00285cbd98e0f0727efb8adcb29ebc1ba
Author: Pavel Hrdina <phrdina>
Date:   Tue Jan 17 10:33:22 2023 +0100

    docs: document correct cpu shares limits with both cgroups v1 and v2

and is part of libvirt 9.1.0 so already part of the rebase to 9.1.0.

Since this BZ covers only libvirt reporting correct error messages changing priority to low
and severity to low.

Comment 6 Meina Li 2023-07-26 03:39:45 UTC
Comment 4 is expected result. Also do some basic test here:

Test Version:
libvirt-9.5.0-3.el9.x86_64
qemu-kvm-8.0.0-9.el9.x86_64

Test Steps:
S1: Start guest with 10000 weights.
1. Prepare a guest with 10000 shares.
 <cputune>
    <shares>10000</shares>
  </cputune>
2. Start the guest.
# virsh start rhel
Domain 'rhel' started
# virsh dumpxml rhel --xpath cputune
<cputune>
  <shares>10000</shares>
</cputune>

S2: Start guest with 10001/0 weights.
1. Prepare a guest with 10000 shares.
 <cputune>
    <shares>10001</shares>
  </cputune>
2. Start the guest.
# virsh start rhel
error: Failed to start domain 'rhel'
error: invalid argument: shares '10001' must be in range [1, 10000]
3. Prepare a guest with 10000 shares.
 <cputune>
    <shares>0</shares>
  </cputune>
4. Start the guest.
# virsh start rhel
error: Failed to start domain 'rhel'
error: invalid argument: shares '0' must be in range [1, 10000]

Comment 9 Meina Li 2023-08-15 03:49:23 UTC
Verified Version:
libvirt-9.5.0-5.el9.x86_64
qemu-kvm-8.0.0-11.el9.x86_64

Verified Steps:
As comment 6

Comment 11 errata-xmlrpc 2023-11-07 08:30:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6409