Description of problem: Placement API reports incorrect usage of VCPU when requesting an allocation. At the moment I have 2 VMs running on the compute with 12 vcpus per VM and the host is having 64 CPUs available & online. However, VM creation fails after creation of 2 VMs on every compute node. nova-compute resource agent reports 24 vcpus in use by the compute whereas, a new vm creation request fails due to a failure in allocating VCPU to the VM. +++ RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. ", "request_id": "req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0"}]} +++ Do note that cpu_shared_set & cpu_dedicated_set were not specified so, the cpu_allocation_ratio would be 16.0 by default. Here is the issue from Placement API, with respect to 1 resource provider: +--------------------------------------+-----------------------+------------+ | uuid | name | generation | +--------------------------------------+-----------------------+------------+ | 32c47b84-3bd6-4022-8455-867d1b819dd3 | compute-4.localdomain | 17 | | 84b1755d-8ffd-4196-a3a6-c6218970307e | compute-2.localdomain | 18 | | 65929119-23f6-4ba2-b98b-4eab5884633f | compute-5.localdomain | 15 | | 6664ea69-8737-459c-af0a-e42108a6dcf7 | compute-0.localdomain | 15 | | a9af1312-abbc-4151-a27f-beb901fb638b | compute-3.localdomain | 15 | | 4c3b2cae-8ff1-4230-bc43-6f95ff70506c | compute-6.localdomain | 17 | | 920eaa5d-ea15-4c05-8910-0b34f66b7b92 | compute-1.localdomain | 17 | +--------------------------------------+-----------------------+------------+ Let's take compute-5 as rp, in the next few commands. # openstack resource provider show 65929119-23f6-4ba2-b98b-4eab5884633f --allocation +-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | uuid | 65929119-23f6-4ba2-b98b-4eab5884633f | | name | compute-5.localdomain | | generation | 15 | | allocations | {'5d939491-fca1-4c67-98b4-6e0d1bb8eac8': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}, '5f37eb85-bbdb-4008-9d8f-5394d12ffb66': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}} | +-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ # openstack resource provider usage show 65929119-23f6-4ba2-b98b-4eab5884633f +----------------+-------+ | resource_class | usage | +----------------+-------+ | VCPU | 24 | | MEMORY_MB | 16384 | | DISK_GB | 200 | +----------------+-------+ So, based on these we would assume that we have resources. Let's request an allocation. # openstack resource provider allocation set --allocation rp=65929119-23f6-4ba2-b98b-4eab5884633f,VCPU=4,DISK_GB=100,MEMORY_MB=8192 65929119-23f6-4ba2-b98b-4eab5884633f --debug It fails with the same error as mentioned above, here is an excerpt from the placement api logs for request id req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0 +++ RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. ", "request_id": "req-1032c99c-5e57-4982-b838-2c0263ee5fb1"}]} PUT call to placement for http://192.16.0.51:8778/placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f used request id req-1032c99c-5e57-4982-b838-2c0263ee5fb1 Request returned failure status: 409 Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409) Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 32, in _wrap_http_exceptions yield File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request return self.session.request(url, method, File "/usr/lib/python3.9/site-packages/keystoneauth1/session.py", line 986, in request raise exceptions.from_response(resp, method, url) keystoneauth1.exceptions.http.Conflict: Conflict (HTTP 409) (Request-ID: req-1032c99c-5e57-4982-b838-2c0263ee5fb1) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/cliff/app.py", line 401, in run_subcommand result = cmd.run(parsed_args) File "/usr/lib/python3.9/site-packages/osc_lib/command/command.py", line 39, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3.9/site-packages/cliff/display.py", line 115, in run column_names, data = self.take_action(parsed_args) File "/usr/lib/python3.9/site-packages/osc_placement/resources/allocation.py", line 139, in take_action http.request('PUT', url, json=payload) File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request return self.session.request(url, method, File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 39, in _wrap_http_exceptions six.raise_from(exc_class(exc.http_status, msg), exc) File "<string>", line 3, in raise_from osc_lib.exceptions.Conflict: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409) clean_up SetAllocation: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409) +++ +++ controller-1 | CHANGED | rc=0 >> /var/log/containers/placement/placement.log:2022-07-07 08:11:27.601 16 DEBUG placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 - - - - -] Starting request: 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" __call__ /usr/lib/python3.9/site-packages/placement/requestlog.py:55 /var/log/containers/placement/placement.log:2022-07-07 08:11:27.729 16 WARNING placement.objects.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Over capacity for VCPU on resource provider 65929119-23f6-4ba2-b98b-4eab5884633f. Needed: 12, Used: 16608, Capacity: 1024.0 /var/log/containers/placement/placement.log:2022-07-07 08:11:27.736 16 DEBUG placement.handlers.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Deleted auto-created consumer with consumer UUID 65929119-23f6-4ba2-b98b-4eab5884633f after failed allocation delete_consumers /usr/lib/python3.9/site-packages/placement/handlers/allocation.py:364 /var/log/containers/placement/placement.log:2022-07-07 08:11:27.737 16 DEBUG placement.wsgi_wrapper [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Placement API returning an error response: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. call_func /usr/lib/python3.9/site-packages/placement/wsgi_wrapper.py:31 /var/log/containers/placement/placement.log:2022-07-07 08:11:27.739 16 INFO placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" status: 409 len: 364 microversion: 1.0 +++ I'm not sure how the used VCPUs is being reported as 16608 with just 2 VMs with 12 vcpus each running on the mentioned compute node with 64 cpus. The max vcpus seem fine 64*16). Version-Release number of selected component (if applicable): [root@controller-1 /]# rpm -qa |grep -i placement python3-placement-5.0.1-0.20210813021511.adf525a.el9ost.noarch openstack-placement-common-5.0.1-0.20210813021511.adf525a.el9ost.noarch openstack-placement-api-5.0.1-0.20210813021511.adf525a.el9ost.noarch [root@controller-1 /]# rpm -qa |grep -i nova python3-novaclient-17.4.0-0.20210812172018.54d4da1.el9ost.noarch How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: VM creation fails. Expected results: VM creation should succeed. Should be able to request an allocation. Additional info: Environment details can be shared for review.
is this form an upstream ci job, a downstream ci job or an issue you hit directly if this is from a deployment you have access to can you provide a set of sos reports. if this is from a ci run can you provide the link to the failing job. this might be a rhel bug in which case we will either need to change the component or close this as cant fix and file a separate bug.
we have identified the cause as a bug in MariaDB that is being fixed by https://bugzilla.redhat.com/show_bug.cgi?id=2096274 I'm going to triage this as urgent urgent for now although we likely will not need to do anything once the new package is available and the container rebuilt.