Bug 2118968

Summary: Failed to create VM with flavor with cpu quota settings when kernel support cpu controller
Product: Red Hat OpenStack Reporter: chhu
Component: openstack-novaAssignee: Amit Uniyal <auniyal>
Status: CLOSED ERRATA QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: alifshit, auniyal, berrange, bgibizer, dasmith, dhughes, eglynn, jgrosso, jhakimra, juzhou, kchamart, lpiwowar, mprivozn, pgrist, phrdina, sbauza, sgordon, smooney, vromanso, yisun
Target Milestone: gaKeywords: Patch, Regression, Triaged
Target Release: 17.1   
Hardware: x86_64   
OS: Linux   
Whiteboard: libvirt_OSP_INT
Fixed In Version: openstack-nova-23.2.3-1.20230518170957.7e3a8a1.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 01:11:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2035518, 2121158    

Description chhu 2022-08-17 09:03:01 UTC
Description of problem:
Failed to create VM with flavor settings-(cpu_period cpu_quota cpu_shares), when kernel support cpu controller

Version-Release number of selected component (if applicable):
openstack-nova-compute-23.2.2-0.20220720130412.7074ac0.el9ost.noarch
libvirt-daemon-driver-qemu-8.0.0-8.1.el9_0.x86_64
kernel-5.14.0-70.17.1.el9_0.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Setup the OSP17 & RHEL9.0 env by running the job:
custom-17.0_compact-director-rhel-9.0-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph #7

2. Create the image, network and flavor with cpu quota settings
# openstack flavor create flavor_cpu --id 101 --ram 2048 --disk 10 --vcpus 2
# openstack flavor set flavor_cpu --property hw:boot_menu='true'  --property quota:cpu_period='1000000' --property quota:cpu_quota='1000000000' --property quota:cpu_shares='2048' 

# openstack network create asb-net1
# openstack subnet create subasb-net1 --network asb-net1 --subnet-range 192.168.32.0/22

# openstack image create asb-qcow2  --disk-format qcow2 --container-format bare --file /tmp/RHEL-9.0.0-20220429.1-x86_64.qcow2
 
[stack@undercloud-0 ~]$ openstack flavor list
+--------------------------------------+------------+------+------+-----------+-
| ID                                   | Name       |  RAM | Disk | Ephemeral | VCPUs | Is Public |
+--------------------------------------+------------+------+------+-----------+-
| 100                                  | asb-m2     |  512 |   10 |         0 |     1 | True      |
| 101                                  | flavor_cpu | 2048 |   10 |         0 |     2 | True      |

[stack@undercloud-0 ~]$ openstack image list
+--------------------------------------+----------------------------------+-----
| ID                                   | Name                             | Status |
+--------------------------------------+----------------------------------+-----
| 28482783-206d-4e7b-8fe3-07eef56d447c | asb-qcow2                        | active |

3. Try to create VM, hit error: "Requested CPU control policy not supported by host"
[stack@undercloud-0 ~]$ openstack server create --flavor flavor_cpu --image asb-qcow2 --nic net-id=a0f5494e-d027-48f4-84b4-3de0ecfe402a --availability-zone nova:compute-0.redhat.local vm-r9-qcow2

[stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+------------------+--------+-------------------------+--------------------------+------------+
| ID                                   | Name             | Status | Networks                | Image                    | Flavor     |
+--------------------------------------+------------------+--------+-------------------------+--------------------------+------------+
| 7e752bb9-6793-42fa-b641-6a153fc292a6 | vm-r9-qcow2      | ERROR  |                         | asb-qcow2                | flavor_cpu |

[stack@undercloud-0 ~]$ openstack server show vm-r9-qcow2
......
| fault                               | {'code': 500, 'created': '2022-08-17T08:24:38Z', 'message': 'Requested CPU control policy not supported by host', 'details': 'Traceback (most recent call last):\n  File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2409, in _build_and_run_instance\n    self.driver.spawn(context, instance, image_meta,\n  File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4189, in spawn\n    xml = self._get_guest_xml(context, instance, network_info,\n  File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7038, in _get_guest_xml\n    conf = self._get_guest_config(instance, network_info, image_meta,\n  File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 6627, in _get_guest_config\n    self._update_guest_cputune(guest, flavor)\n  File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 5490, in _update_guest_cputune\n    raise exception.UnsupportedHostCPUControlPolicy()\nnova.exception.UnsupportedHostCPUControlPolicy: Requested CPU control policy not supported by host\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2232, in _do_build_and_run_instance\n    self._build_and_run_instance(context, instance, image,\n  File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2505, in _build_and_run_instance\n    raise exception.RescheduledException(\nnova.exception.RescheduledException: Build of instance 7e752bb9-6793-42fa-b641-6a153fc292a6 was re-scheduled: Requested CPU control policy not supported by host\n'} |
| flavor                              | disk='10', ephemeral='0', extra_specs.hw:boot_menu='true', extra_specs.quota:cpu_period='1000000', extra_specs.quota:cpu_quota='1000000000', extra_specs.quota:cpu_shares='2048', original_name='flavor_cpu', ram='2048', swap='0', vcpus='2'

4. Create VM with another flavor without the cpu settings:"--property quota:cpu_period='1000000' --property quota:cpu_quota='1000000000' --property quota:cpu_shares='2048'", VM is created successfully

5. Check the nova code, it checks "cpu" in "/proc/self/mounts".
 
https://github.com/openstack/nova/blob/0b0fa8ac315ed497abfa4248ba5d8b0bb145d9b3/nova/virt/libvirt/driver.py  line: 5687-5695
------------------------------------------------------------------
 def _update_guest_cputune(self, guest, flavor):
        is_able = self._host.is_cpu_control_policy_capable()

        cputuning = ['shares', 'period', 'quota']
        wants_cputune = any([k for k in cputuning
            if "quota:cpu_" + k in flavor.extra_specs.keys()])

        if wants_cputune and not is_able:
            raise exception.UnsupportedHostCPUControlPolicy()
------------------------------------------------------------------

https://github.com/openstack/nova/blob/0b0fa8ac315ed497abfa4248ba5d8b0bb145d9b3/nova/virt/libvirt/host.py#L1608    line: 1608-1623
-------------------------------------------------------------- 
def is_cpu_control_policy_capable(self):
        """Returns whether kernel configuration CGROUP_SCHED is enabled
        CONFIG_CGROUP_SCHED may be disabled in some kernel configs to
        improve scheduler latency.
        """
        try:
            with open("/proc/self/mounts", "r") as fd:  -> It checks the /proc/self/mounts
                for line in fd.readlines():
                    # mount options and split options
                    bits = line.split()[3].split(",")
                    if "cpu" in bits:
                        return True
                return False
        except IOError:
            return False
--------------------------------------------------------------

6. But for RHEL9, it use cgroup v2 by default.
   We need to check "cpu" in /sys/fs/cgroup/cgroup.controllers.
   The current compute node supports cpu controller. 
   Thus, we may need to change the code in "nova/virt/libvirt/host.py"
---------------------------------------------------------------------
[heat-admin@compute-0 ~]$ cat /sys/fs/cgroup/cgroup.controllers | grep cpu
cpuset cpu io memory hugetlb pids rdma misc

[heat-admin@compute-0 ~]$ cat /sys/fs/cgroup/machine.slice/cgroup.controllers | grep cpu
cpuset cpu io memory hugetlb pids

[heat-admin@compute-0 ~]$ cat /proc/self/mounts| grep cpu  -> This way is for checking with cgroupv1

No output 

-------------------------------------------------------------------

Actual results:
Failed to create guest with flavor settings-(cpu_period cpu_quota cpu_shares), when kernel support cpu controller

Expected results:
Create guest successfully with flavor settings-(cpu_period cpu_quota cpu_shares), when kernel support cpu controller

Additional info:
Bug1513930 - RFE: rewrite cgroups code to support v2 subsystem

Comment 1 chhu 2022-08-17 09:09:57 UTC
On the master branch, codes are in the same lines.
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py  line 5687-5695
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py  line: 1608-1623

Comment 2 Artom Lifshitz 2022-12-13 16:00:07 UTC
I'm removing the Triaged keyword so that we can have a discussion about this. 17.0.1 is now blockers-only, and we're not sure if there's going to be a 17.0.2. Since this is a regression, we want to decide whether it's had enough that we ask for a bocker flag on this. In all cases, we'll copy this to 17.1 so that we can fix it there.

Comment 4 Artom Lifshitz 2022-12-14 16:32:06 UTC
Conclusion:

1. File a known issue for 17.0.
2. Fix only the host support detection in 17.1.
3. Document that the values are host and virt driver dependant, and if you're upgrading to 17.1 you need to:
   a. Make sure the values in your extra specs are supported by cgroups v2 on RHEL 9.
   b. Create new flavors and resize your instances if they're not.

Comment 5 Artom Lifshitz 2022-12-15 14:38:46 UTC
> 1. File a known issue for 17.0.

https://bugzilla.redhat.com/show_bug.cgi?id=2153815

> 2. Fix only the host support detection in 17.1.

Re-targetted this BZ to 17.1.

> 3. Document that the values are host and virt driver dependant, and if
> you're upgrading to 17.1 you need to:
>    a. Make sure the values in your extra specs are supported by cgroups v2
> on RHEL 9.
>    b. Create new flavors and resize your instances if they're not.

Added a note to the same BZ (https://bugzilla.redhat.com/show_bug.cgi?id=2153815)

Comment 25 errata-xmlrpc 2023-08-16 01:11:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577