Bug 2118968 - Failed to create VM with flavor with cpu quota settings when kernel support cpu controller
Summary: Failed to create VM with flavor with cpu quota settings when kernel support c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ga
: 17.1
Assignee: Amit Uniyal
QA Contact: OSP DFG:Compute
URL:
Whiteboard: libvirt_OSP_INT
Depends On:
Blocks: 2035518 2121158
TreeView+ depends on / blocked
 
Reported: 2022-08-17 09:03 UTC by chhu
Modified: 2023-08-16 01:12 UTC (History)
20 users (show)

Fixed In Version: openstack-nova-23.2.3-1.20230518170957.7e3a8a1.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:11:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad.net nova/+bug/2008102 0 None None None 2023-04-03 10:14:12 UTC
OpenStack gerrit 873127 0 None MERGED Have host look for CPU controller of cgroupsv2 location. 2023-06-02 05:09:00 UTC
Red Hat Issue Tracker OSP-18227 0 None None None 2022-08-17 09:08:13 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:12:13 UTC

Description chhu 2022-08-17 09:03:01 UTC
Description of problem:
Failed to create VM with flavor settings-(cpu_period cpu_quota cpu_shares), when kernel support cpu controller

Version-Release number of selected component (if applicable):
openstack-nova-compute-23.2.2-0.20220720130412.7074ac0.el9ost.noarch
libvirt-daemon-driver-qemu-8.0.0-8.1.el9_0.x86_64
kernel-5.14.0-70.17.1.el9_0.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Setup the OSP17 & RHEL9.0 env by running the job:
custom-17.0_compact-director-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph #7

2. Create the image, network and flavor with cpu quota settings
# openstack flavor create flavor_cpu --id 101 --ram 2048 --disk 10 --vcpus 2
# openstack flavor set flavor_cpu --property hw:boot_menu='true'  --property quota:cpu_period='1000000' --property quota:cpu_quota='1000000000' --property quota:cpu_shares='2048' 

# openstack network create asb-net1
# openstack subnet create subasb-net1 --network asb-net1 --subnet-range 192.168.32.0/22

# openstack image create asb-qcow2  --disk-format qcow2 --container-format bare --file /tmp/RHEL-9.0.0-20220429.1-x86_64.qcow2
 
[stack@undercloud-0 ~]$ openstack flavor list
+--------------------------------------+------------+------+------+-----------+-
| ID                                   | Name       |  RAM | Disk | Ephemeral | VCPUs | Is Public |
+--------------------------------------+------------+------+------+-----------+-
| 100                                  | asb-m2     |  512 |   10 |         0 |     1 | True      |
| 101                                  | flavor_cpu | 2048 |   10 |         0 |     2 | True      |

[stack@undercloud-0 ~]$ openstack image list
+--------------------------------------+----------------------------------+-----
| ID                                   | Name                             | Status |
+--------------------------------------+----------------------------------+-----
| 28482783-206d-4e7b-8fe3-07eef56d447c | asb-qcow2                        | active |

3. Try to create VM, hit error: "Requested CPU control policy not supported by host"
[stack@undercloud-0 ~]$ openstack server create --flavor flavor_cpu --image asb-qcow2 --nic net-id=a0f5494e-d027-48f4-84b4-3de0ecfe402a --availability-zone nova:compute-0.redhat.local vm-r9-qcow2

[stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+------------------+--------+-------------------------+--------------------------+------------+
| ID                                   | Name             | Status | Networks                | Image                    | Flavor     |
+--------------------------------------+------------------+--------+-------------------------+--------------------------+------------+
| 7e752bb9-6793-42fa-b641-6a153fc292a6 | vm-r9-qcow2      | ERROR  |                         | asb-qcow2                | flavor_cpu |

[stack@undercloud-0 ~]$ openstack server show vm-r9-qcow2
......
| fault                               | {'code': 500, 'created': '2022-08-17T08:24:38Z', 'message': 'Requested CPU control policy not supported by host', 'details': 'Traceback (most recent call last):\n  File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2409, in _build_and_run_instance\n    self.driver.spawn(context, instance, image_meta,\n  File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4189, in spawn\n    xml = self._get_guest_xml(context, instance, network_info,\n  File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7038, in _get_guest_xml\n    conf = self._get_guest_config(instance, network_info, image_meta,\n  File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 6627, in _get_guest_config\n    self._update_guest_cputune(guest, flavor)\n  File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 5490, in _update_guest_cputune\n    raise exception.UnsupportedHostCPUControlPolicy()\nnova.exception.UnsupportedHostCPUControlPolicy: Requested CPU control policy not supported by host\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2232, in _do_build_and_run_instance\n    self._build_and_run_instance(context, instance, image,\n  File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2505, in _build_and_run_instance\n    raise exception.RescheduledException(\nnova.exception.RescheduledException: Build of instance 7e752bb9-6793-42fa-b641-6a153fc292a6 was re-scheduled: Requested CPU control policy not supported by host\n'} |
| flavor                              | disk='10', ephemeral='0', extra_specs.hw:boot_menu='true', extra_specs.quota:cpu_period='1000000', extra_specs.quota:cpu_quota='1000000000', extra_specs.quota:cpu_shares='2048', original_name='flavor_cpu', ram='2048', swap='0', vcpus='2'

4. Create VM with another flavor without the cpu settings:"--property quota:cpu_period='1000000' --property quota:cpu_quota='1000000000' --property quota:cpu_shares='2048'", VM is created successfully

5. Check the nova code, it checks "cpu" in "/proc/self/mounts".
 
https://github.com/openstack/nova/blob/0b0fa8ac315ed497abfa4248ba5d8b0bb145d9b3/nova/virt/libvirt/driver.py  line: 5687-5695
------------------------------------------------------------------
 def _update_guest_cputune(self, guest, flavor):
        is_able = self._host.is_cpu_control_policy_capable()

        cputuning = ['shares', 'period', 'quota']
        wants_cputune = any([k for k in cputuning
            if "quota:cpu_" + k in flavor.extra_specs.keys()])

        if wants_cputune and not is_able:
            raise exception.UnsupportedHostCPUControlPolicy()
------------------------------------------------------------------

https://github.com/openstack/nova/blob/0b0fa8ac315ed497abfa4248ba5d8b0bb145d9b3/nova/virt/libvirt/host.py#L1608    line: 1608-1623
-------------------------------------------------------------- 
def is_cpu_control_policy_capable(self):
        """Returns whether kernel configuration CGROUP_SCHED is enabled
        CONFIG_CGROUP_SCHED may be disabled in some kernel configs to
        improve scheduler latency.
        """
        try:
            with open("/proc/self/mounts", "r") as fd:  -> It checks the /proc/self/mounts
                for line in fd.readlines():
                    # mount options and split options
                    bits = line.split()[3].split(",")
                    if "cpu" in bits:
                        return True
                return False
        except IOError:
            return False
--------------------------------------------------------------

6. But for RHEL9, it use cgroup v2 by default.
   We need to check "cpu" in /sys/fs/cgroup/cgroup.controllers.
   The current compute node supports cpu controller. 
   Thus, we may need to change the code in "nova/virt/libvirt/host.py"
---------------------------------------------------------------------
[heat-admin@compute-0 ~]$ cat /sys/fs/cgroup/cgroup.controllers | grep cpu
cpuset cpu io memory hugetlb pids rdma misc

[heat-admin@compute-0 ~]$ cat /sys/fs/cgroup/machine.slice/cgroup.controllers | grep cpu
cpuset cpu io memory hugetlb pids

[heat-admin@compute-0 ~]$ cat /proc/self/mounts| grep cpu  -> This way is for checking with cgroupv1

No output 

-------------------------------------------------------------------

Actual results:
Failed to create guest with flavor settings-(cpu_period cpu_quota cpu_shares), when kernel support cpu controller

Expected results:
Create guest successfully with flavor settings-(cpu_period cpu_quota cpu_shares), when kernel support cpu controller

Additional info:
Bug1513930 - RFE: rewrite cgroups code to support v2 subsystem

Comment 1 chhu 2022-08-17 09:09:57 UTC
On the master branch, codes are in the same lines.
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py  line 5687-5695
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py  line: 1608-1623

Comment 2 Artom Lifshitz 2022-12-13 16:00:07 UTC
I'm removing the Triaged keyword so that we can have a discussion about this. 17.0.1 is now blockers-only, and we're not sure if there's going to be a 17.0.2. Since this is a regression, we want to decide whether it's had enough that we ask for a bocker flag on this. In all cases, we'll copy this to 17.1 so that we can fix it there.

Comment 4 Artom Lifshitz 2022-12-14 16:32:06 UTC
Conclusion:

1. File a known issue for 17.0.
2. Fix only the host support detection in 17.1.
3. Document that the values are host and virt driver dependant, and if you're upgrading to 17.1 you need to:
   a. Make sure the values in your extra specs are supported by cgroups v2 on RHEL 9.
   b. Create new flavors and resize your instances if they're not.

Comment 5 Artom Lifshitz 2022-12-15 14:38:46 UTC
> 1. File a known issue for 17.0.

https://bugzilla.redhat.com/show_bug.cgi?id=2153815

> 2. Fix only the host support detection in 17.1.

Re-targetted this BZ to 17.1.

> 3. Document that the values are host and virt driver dependant, and if
> you're upgrading to 17.1 you need to:
>    a. Make sure the values in your extra specs are supported by cgroups v2
> on RHEL 9.
>    b. Create new flavors and resize your instances if they're not.

Added a note to the same BZ (https://bugzilla.redhat.com/show_bug.cgi?id=2153815)

Comment 25 errata-xmlrpc 2023-08-16 01:11:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.