Description of problem:
Placement API reports incorrect usage of VCPU when requesting an allocation.
At the moment I have 2 VMs running on the compute with 12 vcpus per VM and the host is having 64 CPUs available & online.
However, VM creation fails after creation of 2 VMs on every compute node.
nova-compute resource agent reports 24 vcpus in use by the compute whereas, a new vm creation request fails due to a failure in allocating VCPU to the VM.
+++
RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. ", "request_id": "req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0"}]}
+++
Do note that cpu_shared_set & cpu_dedicated_set were not specified so, the cpu_allocation_ratio would be 16.0 by default.
Here is the issue from Placement API, with respect to 1 resource provider:
+--------------------------------------+-----------------------+------------+
| uuid | name | generation |
+--------------------------------------+-----------------------+------------+
| 32c47b84-3bd6-4022-8455-867d1b819dd3 | compute-4.localdomain | 17 |
| 84b1755d-8ffd-4196-a3a6-c6218970307e | compute-2.localdomain | 18 |
| 65929119-23f6-4ba2-b98b-4eab5884633f | compute-5.localdomain | 15 |
| 6664ea69-8737-459c-af0a-e42108a6dcf7 | compute-0.localdomain | 15 |
| a9af1312-abbc-4151-a27f-beb901fb638b | compute-3.localdomain | 15 |
| 4c3b2cae-8ff1-4230-bc43-6f95ff70506c | compute-6.localdomain | 17 |
| 920eaa5d-ea15-4c05-8910-0b34f66b7b92 | compute-1.localdomain | 17 |
+--------------------------------------+-----------------------+------------+
Let's take compute-5 as rp, in the next few commands.
# openstack resource provider show 65929119-23f6-4ba2-b98b-4eab5884633f --allocation
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| uuid | 65929119-23f6-4ba2-b98b-4eab5884633f |
| name | compute-5.localdomain |
| generation | 15 |
| allocations | {'5d939491-fca1-4c67-98b4-6e0d1bb8eac8': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}, '5f37eb85-bbdb-4008-9d8f-5394d12ffb66': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}} |
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# openstack resource provider usage show 65929119-23f6-4ba2-b98b-4eab5884633f
+----------------+-------+
| resource_class | usage |
+----------------+-------+
| VCPU | 24 |
| MEMORY_MB | 16384 |
| DISK_GB | 200 |
+----------------+-------+
So, based on these we would assume that we have resources. Let's request an allocation.
# openstack resource provider allocation set --allocation rp=65929119-23f6-4ba2-b98b-4eab5884633f,VCPU=4,DISK_GB=100,MEMORY_MB=8192 65929119-23f6-4ba2-b98b-4eab5884633f --debug
It fails with the same error as mentioned above, here is an excerpt from the placement api logs for request id req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0
+++
RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. ", "request_id": "req-1032c99c-5e57-4982-b838-2c0263ee5fb1"}]}
PUT call to placement for http://192.16.0.51:8778/placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f used request id req-1032c99c-5e57-4982-b838-2c0263ee5fb1
Request returned failure status: 409
Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409)
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 32, in _wrap_http_exceptions
yield
File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request
return self.session.request(url, method,
File "/usr/lib/python3.9/site-packages/keystoneauth1/session.py", line 986, in request
raise exceptions.from_response(resp, method, url)
keystoneauth1.exceptions.http.Conflict: Conflict (HTTP 409) (Request-ID: req-1032c99c-5e57-4982-b838-2c0263ee5fb1)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/cliff/app.py", line 401, in run_subcommand
result = cmd.run(parsed_args)
File "/usr/lib/python3.9/site-packages/osc_lib/command/command.py", line 39, in run
return super(Command, self).run(parsed_args)
File "/usr/lib/python3.9/site-packages/cliff/display.py", line 115, in run
column_names, data = self.take_action(parsed_args)
File "/usr/lib/python3.9/site-packages/osc_placement/resources/allocation.py", line 139, in take_action
http.request('PUT', url, json=payload)
File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request
return self.session.request(url, method,
File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 39, in _wrap_http_exceptions
six.raise_from(exc_class(exc.http_status, msg), exc)
File "<string>", line 3, in raise_from
osc_lib.exceptions.Conflict: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409)
clean_up SetAllocation: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409)
+++
+++
controller-1 | CHANGED | rc=0 >>
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.601 16 DEBUG placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 - - - - -] Starting request: 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" __call__ /usr/lib/python3.9/site-packages/placement/requestlog.py:55
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.729 16 WARNING placement.objects.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Over capacity for VCPU on resource provider 65929119-23f6-4ba2-b98b-4eab5884633f. Needed: 12, Used: 16608, Capacity: 1024.0
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.736 16 DEBUG placement.handlers.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Deleted auto-created consumer with consumer UUID 65929119-23f6-4ba2-b98b-4eab5884633f after failed allocation delete_consumers /usr/lib/python3.9/site-packages/placement/handlers/allocation.py:364
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.737 16 DEBUG placement.wsgi_wrapper [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Placement API returning an error response: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. call_func /usr/lib/python3.9/site-packages/placement/wsgi_wrapper.py:31
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.739 16 INFO placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" status: 409 len: 364 microversion: 1.0
+++
I'm not sure how the used VCPUs is being reported as 16608 with just 2 VMs with 12 vcpus each running on the mentioned compute node with 64 cpus. The max vcpus seem fine 64*16).
Version-Release number of selected component (if applicable):
[root@controller-1 /]# rpm -qa |grep -i placement
python3-placement-5.0.1-0.20210813021511.adf525a.el9ost.noarch
openstack-placement-common-5.0.1-0.20210813021511.adf525a.el9ost.noarch
openstack-placement-api-5.0.1-0.20210813021511.adf525a.el9ost.noarch
[root@controller-1 /]# rpm -qa |grep -i nova
python3-novaclient-17.4.0-0.20210812172018.54d4da1.el9ost.noarch
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
VM creation fails.
Expected results:
VM creation should succeed. Should be able to request an allocation.
Additional info:
Environment details can be shared for review.
is this form an upstream ci job, a downstream ci job or an issue you hit directly
if this is from a deployment you have access to can you provide a set of sos reports.
if this is from a ci run can you provide the link to the failing job.
this might be a rhel bug in which case we will either need to change the component or close this as cant fix and file a separate bug.
we have identified the cause as a bug in MariaDB that is being fixed by https://bugzilla.redhat.com/show_bug.cgi?id=2096274
I'm going to triage this as urgent urgent for now although we likely will not need to do anything once the new package is available and the container rebuilt.