Bug 2035518
Summary: | migrating guest with the value of cpu.weight larger than 10000 failed due to cgroup changes | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Lili Zhu <lizhu> | |
Component: | openstack-nova | Assignee: | Artom Lifshitz <alifshit> | |
Status: | CLOSED ERRATA | QA Contact: | James Parker <jparker> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 17.0 (Wallaby) | CC: | alifshit, bdobreli, chhu, dasmith, eglynn, fjin, jhakimra, jparker, kchamart, mprivozn, mwitt, phrdina, sbauza, sgordon, smitterl, smooney, stchen, vromanso, xuzhang, yisun | |
Target Milestone: | beta | Keywords: | Patch, Triaged, UpgradeBlocker | |
Target Release: | 17.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openstack-nova-23.2.2-0.20220720130412.7074ac0.el9ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2037734 2121158 (view as bug list) | Environment: | ||
Last Closed: | 2022-09-21 12:18:08 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 2037998, 2118968 | |||
Bug Blocks: | 2121158 |
Description
Lili Zhu
2021-12-24 11:41:49 UTC
Right, libvirt currently uses the same set of limits for both CGroupV1 and V2. That should be fixed on libvirt level. And for mgmt apps (like OpenStack) they need to provide an XML during migration with the values recalculated to fit into CGroupV2 limits. We discussed this rhbz during our team bug call today and can confirm this is a valid problem with migrating guests on RHEL8 hosts to RHEL9 hosts. While this is a regression in behavior from RHEL8 => RHEL9, it is something we need to be able to handle in nova because of the way nova currently assigns a default value for <cputune><shares> [1] when a value was not specified in the flavor extra specs [2]: if guest.cputune is None: guest.cputune = vconfig.LibvirtConfigGuestCPUTune() # Setting the default cpu.shares value to be a value # dependent on the number of vcpus guest.cputune.shares = 1024 * guest.vcpus The idea was to give guests with more vcpus more cpu time. However, the above ^ means that any guest with >= 10 vcpus will not be able to run on a RHEL9 host if the cpu shares quota is 10000 in RHEL9. We are not yet sure how we will address this problem longterm and will discuss it further next week. For workarounds, as you have shown, changing the guest XML will work to enable migration to RHEL9. To workaround the problem using only the nova APIs, you will need to: 1. Create a flavor specifying the desired cpu shares (less than or equal to 10000), for example: $ openstack flavor create FLAVOR_NAME --id FLAVOR_ID \ --ram RAM_IN_MB --disk ROOT_DISK_IN_GB --vcpus NUMBER_OF_VCPUS \ --property quota:cpu_shares=CPU_SHARES 2. Resize the VM to the new flavor, for example: $ openstack server resize --flavor FLAVOR_NAME SERVER 3. Verify VM is running fine after the resize 4. If it's running fine, confirm the resize, for example: $ openstack server resize confirm SERVER 5. If it's not running fine, revert the resize and then debug, for example: $ openstack server resize revert SERVER The resize will cause the VM to be created with the specified CPU_SHARES in the <cputune><shares> in the guest XML and the VM will be able to migrate to a RHEL9 host. We will update this rhbz after we have further discussion about the longterm fix for this issue. [1] https://github.com/openstack/nova/blob/6c3d5de659e558e8f6ee353475b54ff3ca7240ee/nova/virt/libvirt/driver.py#L5482 [2] https://docs.openstack.org/nova/xena/configuration/extra-specs.html#quota:cpu_shares Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543 |