2109813 – VM creation fails due to VCPU allocation issues when the request reaches placement api

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2109813 - VM creation fails due to VCPU allocation issues when the request reaches placement api

Summary: VM creation fails due to VCPU allocation issues when the request reaches plac...

Keywords:
Status:	CLOSED DUPLICATE of bug 2096274
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	mariadb
Sub Component:
Version:	9.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Schorm
QA Contact:	RHEL CS Apps Subsystem QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2104804
TreeView+	depends on / blocked

Reported:	2022-07-22 08:22 UTC by smooney
Modified:	2022-08-17 12:55 UTC (History)
CC List:	8 users (show)
Fixed In Version:	mariadb-10.5.16-2.el9_0
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2104804
Environment:
Last Closed:	2022-08-17 12:55:05 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-128712	0	None	None	None	2022-07-22 08:25:59 UTC

Comment 1 smooney 2022-07-22 08:51:28 UTC

just to summerise.

placment is a component in OpenStack which is basically a restapi for tracking resources and mapping allocation of resource to inventories.
its implemented in python and mostof the heavyliftign is done in SQL.

with the current version of mariadb in rhel 9.0 one of the primary queries we use now returned incorrect data.

this issue was also reported upstream in OpenStack 
[1] https://lists.openstack.org/pipermail/openstack-discuss/2022-July/029536.html
[2] https://lists.openstack.org/pipermail/openstack-discuss/2022-July/029567.html

the reporter upstream provided a reproduced db dump and test sql query to remove the need for placement.

placement show via a different query that the actual usage is
openstack resource provider usage show 65929119-23f6-4ba2-b98b-4eab5884633f
 
+----------------+-------+
| resource_class | usage |
+----------------+-------+
| VCPU           |    24 |
| MEMORY_MB      | 16384 |
| DISK_GB        |   200 |
+----------------+-------+
but when we execute the allocation candidate query we get

Over capacity for VCPU on resource provider 65929119-23f6-4ba2-b98b-4eab5884633f. Needed: 12, Used: 16608, Capacity: 1024.0

the used value is equal to the total of all used vales Used: 16608 == 24+16384+200



so the inner subquery that is suming the allocation per resource type is not generating the correct info.

SELECT
     rp.id, rp.uuid, rp.generation, inv.resource_class_id, inv.total, inv.reserved, inv.allocation_ratio, allocs.used
FROM
     resource_providers AS rp
     JOIN inventories AS inv ON rp.id = inv.resource_provider_id
     LEFT JOIN (
         SELECT
             resource_provider_id, resource_class_id, SUM(used) AS used
         FROM
             allocations
         WHERE
             resource_class_id IN (0, 1, 2)
             AND resource_provider_id IN (5)
         GROUP BY
             resource_provider_id, resource_class_id
     ) AS allocs ON
         inv.resource_provider_id = allocs.resource_provider_id
         AND inv.resource_class_id = allocs.resource_class_id
WHERE
     rp.id IN (5)
     AND inv.resource_class_id IN (0,1,2)
;

others also hit this https://serverfault.com/questions/1064579/openstack-only-building-one-vm-per-machine-in-cluster-then-runs-out-of-resource
and tracked it down to this bug that affect  10.3, 10.4, 10.5, 10.5.10 https://jira.mariadb.org/browse/MDEV-25714

we are schedule to have a beta of our next major release in ~ 2 weeks with full GA follow not long after.
this likely will block full GA and may block beta which is why i have set the highest severity.

the rhel/centos release version number indicate that it should be fixed but we are still seeing this in our internal build.


https://access.redhat.com/downloads/content/rhel---9.0/x86_64/11161/mariadb/10.5.13-2.el9/x86_64/fd431d51/package
centos 9 steram is using 10.5.16
https://gitlab.com/redhat/centos-stream/rpms/mariadb/-/blob/c9s/sources

i am following up with our release deliver team to make sure we are building with the latest 9.0 rpms currently but in the interim can you confirm if that should be fixed in our current packages
or if a backport/rebase is requried.

Comment 2 smooney 2022-07-22 08:58:44 UTC

we appear to be using  mariadb-10.5.13-2.el9.x86_64,
so that imples that this is infact present in the latest version.

the jira issue suggest it fixed in 10.5.11 but based on the last comment
https://jira.mariadb.org/browse/MDEV-25714?focusedCommentId=219653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-219653
its actually fixed in 10.5.15

Comment 3 smooney 2022-07-22 09:38:37 UTC

ok we might want to close this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2096274

Comment 4 Lukas Javorsky 2022-07-22 12:54:34 UTC

Hello Sean,

If this is the duplicate as you mentioned in comment#3 feel free to close this.

We're shipping the mariadb-10.5.16 version to the RHEL-9.0, but it's still in process, so I cannot tell you how long it could take.

Comment 5 smooney 2022-08-17 12:55:05 UTC

thanks, closing as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2096274
that has now been released and I believe we have already pulled mariadb-10.5.16-2.el9_0 into our latest internal build
so we should be unblocked.
http://rhsm-pulp.corp.redhat.com/content/eus/rhel9/9.0/x86_64/appstream/os/Packages/m/mariadb-10.5.16-2.el9_0.x86_64.rpm

*** This bug has been marked as a duplicate of bug 2096274 ***

Note You need to log in before you can comment on or make changes to this bug.