Description of problem: As now in rhel9, cgroup V2 is enabled by default. All weights are in the range [1, 10000]. VM with the value of cpu.weight larger than 10000 can not be started, while this kind of VMs can be started on RHEL8 host which uses cgroup V1. Due to the same reason, migrating this kind of VMs from RHEL8 to RHEL9 will fail. Version-Release: On Source host: libvirt-7.6.0-6.module+el8.5.0+13051+7ddbe958.x86_64 qemu-kvm-6.0.0-33.module+el8.5.0+13041+05be2dc6.x86_64 On target host: libvirt-7.10.0-1.el9.x86_64 qemu-kvm-6.2.0-1.el9.x86_64 1. check the guest xml <domain type='kvm'> <name>VM</name> ... <cputune> <shares>10001</shares> </cputune> ... 2. try to migrate the guest from source host to target guest. # virsh migrate VM qemu+ssh://$target/system --verbose --live error: error from service: GDBus.Error:org.freedesktop.DBus.Error.InvalidArgs: Value specified in CPUWeight is out of range (migration failed) On RHEL, we can migrate this kind of VM successfully by specifying xml, e.g., 3. dump the VM xml # virsh dumpxml VM > VM.xml 4. change the shares value from 10001 to the value smaller than 10000, # cat VM.xml ... <cputune> <shares>10000</shares> </cputune> ... 5. migrate the VM by specify the above xml # virsh migrate VM qemu+ssh://$target/system --verbose --live --xml VM.xml Migration: [100 %] My question is: 1. Is there any problem with migrating this kind of VM on openstack? 2. If there is problem now, can the above workaround be accepted? 2. If the above workaround can be accepted, can the customers specify the xml, then migrate the guest on openstack now? If there is no problem with migrating this kind of VM on openstack, or it can be solved by other method, feel free to close this bug.
Right, libvirt currently uses the same set of limits for both CGroupV1 and V2. That should be fixed on libvirt level. And for mgmt apps (like OpenStack) they need to provide an XML during migration with the values recalculated to fit into CGroupV2 limits.
We discussed this rhbz during our team bug call today and can confirm this is a valid problem with migrating guests on RHEL8 hosts to RHEL9 hosts. While this is a regression in behavior from RHEL8 => RHEL9, it is something we need to be able to handle in nova because of the way nova currently assigns a default value for <cputune><shares> [1] when a value was not specified in the flavor extra specs [2]: if guest.cputune is None: guest.cputune = vconfig.LibvirtConfigGuestCPUTune() # Setting the default cpu.shares value to be a value # dependent on the number of vcpus guest.cputune.shares = 1024 * guest.vcpus The idea was to give guests with more vcpus more cpu time. However, the above ^ means that any guest with >= 10 vcpus will not be able to run on a RHEL9 host if the cpu shares quota is 10000 in RHEL9. We are not yet sure how we will address this problem longterm and will discuss it further next week. For workarounds, as you have shown, changing the guest XML will work to enable migration to RHEL9. To workaround the problem using only the nova APIs, you will need to: 1. Create a flavor specifying the desired cpu shares (less than or equal to 10000), for example: $ openstack flavor create FLAVOR_NAME --id FLAVOR_ID \ --ram RAM_IN_MB --disk ROOT_DISK_IN_GB --vcpus NUMBER_OF_VCPUS \ --property quota:cpu_shares=CPU_SHARES 2. Resize the VM to the new flavor, for example: $ openstack server resize --flavor FLAVOR_NAME SERVER 3. Verify VM is running fine after the resize 4. If it's running fine, confirm the resize, for example: $ openstack server resize confirm SERVER 5. If it's not running fine, revert the resize and then debug, for example: $ openstack server resize revert SERVER The resize will cause the VM to be created with the specified CPU_SHARES in the <cputune><shares> in the guest XML and the VM will be able to migrate to a RHEL9 host. We will update this rhbz after we have further discussion about the longterm fix for this issue. [1] https://github.com/openstack/nova/blob/6c3d5de659e558e8f6ee353475b54ff3ca7240ee/nova/virt/libvirt/driver.py#L5482 [2] https://docs.openstack.org/nova/xena/configuration/extra-specs.html#quota:cpu_shares
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543