Bug 2104804 - VM creation fails due to VCPU allocation issues when the request reaches placement api
Summary: VM creation fails due to VCPU allocation issues when the request reaches plac...
Keywords:
Status: CLOSED DUPLICATE of bug 2096274
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-placement
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ga
: 17.0
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On: 2096274 2109813
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-07 08:17 UTC by Ketan Mehta
Modified: 2023-03-21 19:54 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2109813 (view as bug list)
Environment:
Last Closed: 2022-09-07 10:50:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-16286 0 None None None 2022-07-07 08:26:25 UTC

Description Ketan Mehta 2022-07-07 08:17:32 UTC
Description of problem:

Placement API reports incorrect usage of VCPU when requesting an allocation.

At the moment I have 2 VMs running on the compute with 12 vcpus per VM and the host is having 64 CPUs available & online.

However, VM creation fails after creation of 2 VMs on every compute node.

nova-compute resource agent reports 24 vcpus in use by the compute whereas, a new vm creation request fails due to a failure in allocating VCPU to the VM.

+++
RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity.  ", "request_id": "req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0"}]}
+++

Do note that cpu_shared_set & cpu_dedicated_set were not specified so, the cpu_allocation_ratio would be 16.0 by default.

Here is the issue from Placement API, with respect to 1 resource provider:

+--------------------------------------+-----------------------+------------+
| uuid                                 | name                  | generation |
+--------------------------------------+-----------------------+------------+
| 32c47b84-3bd6-4022-8455-867d1b819dd3 | compute-4.localdomain |         17 |
| 84b1755d-8ffd-4196-a3a6-c6218970307e | compute-2.localdomain |         18 |
| 65929119-23f6-4ba2-b98b-4eab5884633f | compute-5.localdomain |         15 |
| 6664ea69-8737-459c-af0a-e42108a6dcf7 | compute-0.localdomain |         15 |
| a9af1312-abbc-4151-a27f-beb901fb638b | compute-3.localdomain |         15 |
| 4c3b2cae-8ff1-4230-bc43-6f95ff70506c | compute-6.localdomain |         17 |
| 920eaa5d-ea15-4c05-8910-0b34f66b7b92 | compute-1.localdomain |         17 |
+--------------------------------------+-----------------------+------------+

Let's take compute-5 as rp, in the next few commands.

# openstack resource provider show 65929119-23f6-4ba2-b98b-4eab5884633f --allocation

+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field       | Value                                                                                                                                                                                                            |
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| uuid        | 65929119-23f6-4ba2-b98b-4eab5884633f                                                                                                                                                                             |
| name        | compute-5.localdomain                                                                                                                                                                                            |
| generation  | 15                                                                                                                                                                                                               |
| allocations | {'5d939491-fca1-4c67-98b4-6e0d1bb8eac8': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}, '5f37eb85-bbdb-4008-9d8f-5394d12ffb66': {'resources': {'VCPU': 12, 'MEMORY_MB': 8192, 'DISK_GB': 100}}} |
+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

# openstack resource provider usage show 65929119-23f6-4ba2-b98b-4eab5884633f
 
+----------------+-------+
| resource_class | usage |
+----------------+-------+
| VCPU           |    24 |
| MEMORY_MB      | 16384 |
| DISK_GB        |   200 |
+----------------+-------+

So, based on these we would assume that we have resources. Let's request an allocation.

# openstack resource provider allocation set --allocation rp=65929119-23f6-4ba2-b98b-4eab5884633f,VCPU=4,DISK_GB=100,MEMORY_MB=8192 65929119-23f6-4ba2-b98b-4eab5884633f --debug

It fails with the same error as mentioned above, here is an excerpt from the placement api logs for request id req-f31a74af-6bf3-4547-a41b-f29bfcd9b0f0

+++
RESP BODY: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity.  ", "request_id": "req-1032c99c-5e57-4982-b838-2c0263ee5fb1"}]}
PUT call to placement for http://192.16.0.51:8778/placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f used request id req-1032c99c-5e57-4982-b838-2c0263ee5fb1
Request returned failure status: 409
Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409)
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 32, in _wrap_http_exceptions
    yield
  File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request
    return self.session.request(url, method,
  File "/usr/lib/python3.9/site-packages/keystoneauth1/session.py", line 986, in request
    raise exceptions.from_response(resp, method, url)
keystoneauth1.exceptions.http.Conflict: Conflict (HTTP 409) (Request-ID: req-1032c99c-5e57-4982-b838-2c0263ee5fb1)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/cliff/app.py", line 401, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python3.9/site-packages/osc_lib/command/command.py", line 39, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.9/site-packages/cliff/display.py", line 115, in run
    column_names, data = self.take_action(parsed_args)
  File "/usr/lib/python3.9/site-packages/osc_placement/resources/allocation.py", line 139, in take_action
    http.request('PUT', url, json=payload)
  File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 59, in request
    return self.session.request(url, method,
  File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/lib/python3.9/site-packages/osc_placement/http.py", line 39, in _wrap_http_exceptions
    six.raise_from(exc_class(exc.http_status, msg), exc)
  File "<string>", line 3, in raise_from
osc_lib.exceptions.Conflict: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409)
clean_up SetAllocation: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. (HTTP 409)
+++

+++
controller-1 | CHANGED | rc=0 >>
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.601 16 DEBUG placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 - - - - -] Starting request: 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" __call__ /usr/lib/python3.9/site-packages/placement/requestlog.py:55
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.729 16 WARNING placement.objects.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Over capacity for VCPU on resource provider 65929119-23f6-4ba2-b98b-4eab5884633f. Needed: 12, Used: 16608, Capacity: 1024.0
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.736 16 DEBUG placement.handlers.allocation [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Deleted auto-created consumer with consumer UUID 65929119-23f6-4ba2-b98b-4eab5884633f after failed allocation delete_consumers /usr/lib/python3.9/site-packages/placement/handlers/allocation.py:364
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.737 16 DEBUG placement.wsgi_wrapper [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] Placement API returning an error response: Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '65929119-23f6-4ba2-b98b-4eab5884633f'. The requested amount would exceed the capacity. call_func /usr/lib/python3.9/site-packages/placement/wsgi_wrapper.py:31
/var/log/containers/placement/placement.log:2022-07-07 08:11:27.739 16 INFO placement.requestlog [req-1032c99c-5e57-4982-b838-2c0263ee5fb1 98e716dbc1af4bf695e0b6ffc41a7569 0bfd001369604c33bfa8ca01814cff04 - default default] 192.17.1.95 "PUT /placement/allocations/65929119-23f6-4ba2-b98b-4eab5884633f" status: 409 len: 364 microversion: 1.0
+++

I'm not sure how the used VCPUs is being reported as 16608 with just 2 VMs with 12 vcpus each running on the mentioned compute node with 64 cpus. The max vcpus seem fine 64*16).

Version-Release number of selected component (if applicable):

[root@controller-1 /]# rpm -qa |grep -i placement
python3-placement-5.0.1-0.20210813021511.adf525a.el9ost.noarch
openstack-placement-common-5.0.1-0.20210813021511.adf525a.el9ost.noarch
openstack-placement-api-5.0.1-0.20210813021511.adf525a.el9ost.noarch

[root@controller-1 /]# rpm -qa |grep -i nova
python3-novaclient-17.4.0-0.20210812172018.54d4da1.el9ost.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

VM creation fails.

Expected results:

VM creation should succeed. Should be able to request an allocation.

Additional info:
Environment details can be shared for review.

Comment 3 smooney 2022-07-18 17:53:09 UTC
is this form an upstream ci job, a downstream ci job or  an issue you hit directly


if this is from a deployment you have access to can you provide a set of sos reports.
if this is from a ci run can you provide the link to the failing job.

this might be a rhel bug in which case we will either need to change the component or close this as cant fix and file a separate bug.

Comment 7 smooney 2022-07-22 09:44:11 UTC
we have identified the cause as  a bug in MariaDB that is being fixed by https://bugzilla.redhat.com/show_bug.cgi?id=2096274

I'm going to triage this as urgent urgent for now although we likely will not need to do anything once the new package is available and the container rebuilt.


Note You need to log in before you can comment on or make changes to this bug.