Description of problem:
Placement API reports incorrect usage of VCPU when requesting an allocation.
At the moment I have 2 VMs running on the compute with 12 vcpus per VM and the host is having 64 CPUs available & online.
However, VM creation fails after creation of 2 VMs on every compute node.
nova-compute resource agent reports 24 vcpus in use by the compute whereas, a new vm creation request fails due to a failure in allocating VCPU to the VM.
+++
RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. ", "request_id": "req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0"}]}
+++
Do note that cpu_shared_set & cpu_dedicated_set were not specified so, the cpu_allocation_ratio would be 16.0 by default.
Here is the issue from Placement API, with respect to 1 resource provider:
+--------------------------------------+-----------------------+------------+
| uuid | name | generation |
+--------------------------------------+-----------------------+------------+
| 32c47b84-3bd6-4022-8455-867d1b819dd3 | compute-4.localdomain | 17 |
| 84b1755d-8ffd-4196-a3a6-c6218970307e | compute-2.localdomain | 18 |
| 65929119-23f6-4ba2-b98b-4eab5884633f | compute-5.localdomain | 15 |
| 6664ea69-8737-459c-af0a-e42108a6dcf7 | compute-0.localdomain | 15 |
| a9af1312-abbc-4151-a27f-beb901fb638b | compute-3.localdomain | 15 |
| 4c3b2cae-8ff1-4230-bc43-6f95ff70506c | compute-6.localdomain | 17 |
| 920eaa5d-ea15-4c05-8910-0b34f66b7b92 | compute-1.localdomain | 17 |
+--------------------------------------+-----------------------+------------+
Let's take compute-5 as rp, in the next few commands.
# openstack resource provider show 65929119-23f6-4ba2-b98b-4eab5884633f --allocation
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| uuid | 65929119-23f6-4ba2-b98b-4eab5884633f |
| name | compute-5.localdomain |
| generation | 15 |
| allocations | {'5d939491-fca1-4c67-98b4-6e0d1bb8eac8': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}, '5f37eb85-bbdb-4008-9d8f-5394d12ffb66': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}} |
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
# openstack resource provider usage show 65929119-23f6-4ba2-b98b-4eab5884633f
+----------------+-------+
| resource_class | usage |
+----------------+-------+
| VCPU | 24 |
| MEMORY_MB | 16384 |
| DISK_GB | 200 |
+----------------+-------+
So, based on these we would assume that we have resources. Let's request an allocation.
# openstack resource provider allocation set --allocation rp=65929119-23f6-4ba2-b98b-4eab5884633f,VCPU=4,DISK_GB=100,MEMORY_MB=8192 65929119-23f6-4ba2-b98b-4eab5884633f --debug
It fails with the same error as mentioned above, here is an excerpt from the placement api logs for request id req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0
+++
RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. ", "request_id": "req-1032c99c-5e57-4982-b838-2c0263ee5fb1"}]}
PUT call to placement for http://192.16.0.51:8778/placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f used request id req-1032c99c-5e57-4982-b838-2c0263ee5fb1
Request returned failure status: 409
Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409)
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 32, in _wrap_http_exceptions
yield
File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request
return self.session.request(url, method,
File "/usr/lib/python3.9/site-packages/keystoneauth1/session.py", line 986, in request
raise exceptions.from_response(resp, method, url)
keystoneauth1.exceptions.http.Conflict: Conflict (HTTP 409) (Request-ID: req-1032c99c-5e57-4982-b838-2c0263ee5fb1)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/cliff/app.py", line 401, in run_subcommand
result = cmd.run(parsed_args)
File "/usr/lib/python3.9/site-packages/osc_lib/command/command.py", line 39, in run
return super(Command, self).run(parsed_args)
File "/usr/lib/python3.9/site-packages/cliff/display.py", line 115, in run
column_names, data = self.take_action(parsed_args)
File "/usr/lib/python3.9/site-packages/osc_placement/resources/allocation.py", line 139, in take_action
http.request('PUT', url, json=payload)
File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request
return self.session.request(url, method,
File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 39, in _wrap_http_exceptions
six.raise_from(exc_class(exc.http_status, msg), exc)
File "<string>", line 3, in raise_from
osc_lib.exceptions.Conflict: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409)
clean_up SetAllocation: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409)
+++
+++
controller-1 | CHANGED | rc=0 >>
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.601 16 DEBUG placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 - - - - -] Starting request: 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" __call__ /usr/lib/python3.9/site-packages/placement/requestlog.py:55
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.729 16 WARNING placement.objects.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Over capacity for VCPU on resource provider 65929119-23f6-4ba2-b98b-4eab5884633f. Needed: 12, Used: 16608, Capacity: 1024.0
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.736 16 DEBUG placement.handlers.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Deleted auto-created consumer with consumer UUID 65929119-23f6-4ba2-b98b-4eab5884633f after failed allocation delete_consumers /usr/lib/python3.9/site-packages/placement/handlers/allocation.py:364
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.737 16 DEBUG placement.wsgi_wrapper [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Placement API returning an error response: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. call_func /usr/lib/python3.9/site-packages/placement/wsgi_wrapper.py:31
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.739 16 INFO placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" status: 409 len: 364 microversion: 1.0
+++
I'm not sure how the used VCPUs is being reported as 16608 with just 2 VMs with 12 vcpus each running on the mentioned compute node with 64 cpus. The max vcpus seem fine 64*16).
Version-Release number of selected component (if applicable):
[root@controller-1 /]# rpm -qa |grep -i placement
python3-placement-5.0.1-0.20210813021511.adf525a.el9ost.noarch
openstack-placement-common-5.0.1-0.20210813021511.adf525a.el9ost.noarch
openstack-placement-api-5.0.1-0.20210813021511.adf525a.el9ost.noarch
[root@controller-1 /]# rpm -qa |grep -i nova
python3-novaclient-17.4.0-0.20210812172018.54d4da1.el9ost.noarch
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
VM creation fails.
Expected results:
VM creation should succeed. Should be able to request an allocation.
Additional info:
Environment details can be shared for review.
is this form an upstream ci job, a downstream ci job or an issue you hit directly
if this is from a deployment you have access to can you provide a set of sos reports.
if this is from a ci run can you provide the link to the failing job.
this might be a rhel bug in which case we will either need to change the component or close this as cant fix and file a separate bug.
we have identified the cause as a bug in MariaDB that is being fixed by https://bugzilla.redhat.com/show_bug.cgi?id=2096274
I'm going to triage this as urgent urgent for now although we likely will not need to do anything once the new package is available and the container rebuilt.
Description of problem: Placement API reports incorrect usage of VCPU when requesting an allocation. At the moment I have 2 VMs running on the compute with 12 vcpus per VM and the host is having 64 CPUs available & online. However, VM creation fails after creation of 2 VMs on every compute node. nova-compute resource agent reports 24 vcpus in use by the compute whereas, a new vm creation request fails due to a failure in allocating VCPU to the VM. +++ RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. ", "request_id": "req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0"}]} +++ Do note that cpu_shared_set & cpu_dedicated_set were not specified so, the cpu_allocation_ratio would be 16.0 by default. Here is the issue from Placement API, with respect to 1 resource provider: +--------------------------------------+-----------------------+------------+ | uuid | name | generation | +--------------------------------------+-----------------------+------------+ | 32c47b84-3bd6-4022-8455-867d1b819dd3 | compute-4.localdomain | 17 | | 84b1755d-8ffd-4196-a3a6-c6218970307e | compute-2.localdomain | 18 | | 65929119-23f6-4ba2-b98b-4eab5884633f | compute-5.localdomain | 15 | | 6664ea69-8737-459c-af0a-e42108a6dcf7 | compute-0.localdomain | 15 | | a9af1312-abbc-4151-a27f-beb901fb638b | compute-3.localdomain | 15 | | 4c3b2cae-8ff1-4230-bc43-6f95ff70506c | compute-6.localdomain | 17 | | 920eaa5d-ea15-4c05-8910-0b34f66b7b92 | compute-1.localdomain | 17 | +--------------------------------------+-----------------------+------------+ Let's take compute-5 as rp, in the next few commands. # openstack resource provider show 65929119-23f6-4ba2-b98b-4eab5884633f --allocation +-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | uuid | 65929119-23f6-4ba2-b98b-4eab5884633f | | name | compute-5.localdomain | | generation | 15 | | allocations | {'5d939491-fca1-4c67-98b4-6e0d1bb8eac8': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}, '5f37eb85-bbdb-4008-9d8f-5394d12ffb66': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}} | +-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ # openstack resource provider usage show 65929119-23f6-4ba2-b98b-4eab5884633f +----------------+-------+ | resource_class | usage | +----------------+-------+ | VCPU | 24 | | MEMORY_MB | 16384 | | DISK_GB | 200 | +----------------+-------+ So, based on these we would assume that we have resources. Let's request an allocation. # openstack resource provider allocation set --allocation rp=65929119-23f6-4ba2-b98b-4eab5884633f,VCPU=4,DISK_GB=100,MEMORY_MB=8192 65929119-23f6-4ba2-b98b-4eab5884633f --debug It fails with the same error as mentioned above, here is an excerpt from the placement api logs for request id req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0 +++ RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. ", "request_id": "req-1032c99c-5e57-4982-b838-2c0263ee5fb1"}]} PUT call to placement for http://192.16.0.51:8778/placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f used request id req-1032c99c-5e57-4982-b838-2c0263ee5fb1 Request returned failure status: 409 Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409) Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 32, in _wrap_http_exceptions yield File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request return self.session.request(url, method, File "/usr/lib/python3.9/site-packages/keystoneauth1/session.py", line 986, in request raise exceptions.from_response(resp, method, url) keystoneauth1.exceptions.http.Conflict: Conflict (HTTP 409) (Request-ID: req-1032c99c-5e57-4982-b838-2c0263ee5fb1) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/cliff/app.py", line 401, in run_subcommand result = cmd.run(parsed_args) File "/usr/lib/python3.9/site-packages/osc_lib/command/command.py", line 39, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3.9/site-packages/cliff/display.py", line 115, in run column_names, data = self.take_action(parsed_args) File "/usr/lib/python3.9/site-packages/osc_placement/resources/allocation.py", line 139, in take_action http.request('PUT', url, json=payload) File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request return self.session.request(url, method, File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 39, in _wrap_http_exceptions six.raise_from(exc_class(exc.http_status, msg), exc) File "<string>", line 3, in raise_from osc_lib.exceptions.Conflict: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409) clean_up SetAllocation: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409) +++ +++ controller-1 | CHANGED | rc=0 >> /var/log/containers/placement/placement.log:2022-07-07 08:11:27.601 16 DEBUG placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 - - - - -] Starting request: 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" __call__ /usr/lib/python3.9/site-packages/placement/requestlog.py:55 /var/log/containers/placement/placement.log:2022-07-07 08:11:27.729 16 WARNING placement.objects.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Over capacity for VCPU on resource provider 65929119-23f6-4ba2-b98b-4eab5884633f. Needed: 12, Used: 16608, Capacity: 1024.0 /var/log/containers/placement/placement.log:2022-07-07 08:11:27.736 16 DEBUG placement.handlers.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Deleted auto-created consumer with consumer UUID 65929119-23f6-4ba2-b98b-4eab5884633f after failed allocation delete_consumers /usr/lib/python3.9/site-packages/placement/handlers/allocation.py:364 /var/log/containers/placement/placement.log:2022-07-07 08:11:27.737 16 DEBUG placement.wsgi_wrapper [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Placement API returning an error response: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. call_func /usr/lib/python3.9/site-packages/placement/wsgi_wrapper.py:31 /var/log/containers/placement/placement.log:2022-07-07 08:11:27.739 16 INFO placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" status: 409 len: 364 microversion: 1.0 +++ I'm not sure how the used VCPUs is being reported as 16608 with just 2 VMs with 12 vcpus each running on the mentioned compute node with 64 cpus. The max vcpus seem fine 64*16). Version-Release number of selected component (if applicable): [root@controller-1 /]# rpm -qa |grep -i placement python3-placement-5.0.1-0.20210813021511.adf525a.el9ost.noarch openstack-placement-common-5.0.1-0.20210813021511.adf525a.el9ost.noarch openstack-placement-api-5.0.1-0.20210813021511.adf525a.el9ost.noarch [root@controller-1 /]# rpm -qa |grep -i nova python3-novaclient-17.4.0-0.20210812172018.54d4da1.el9ost.noarch How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: VM creation fails. Expected results: VM creation should succeed. Should be able to request an allocation. Additional info: Environment details can be shared for review.