Bug 2219598 - compute with 20 NVIDIA A100 GPU allows only 19/20 GPU instances to being spawned
Summary: compute with 20 NVIDIA A100 GPU allows only 19/20 GPU instances to being spawned
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-04 13:51 UTC by alisci
Modified: 2023-08-08 09:32 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-26300 0 None None None 2023-07-04 13:53:47 UTC

Description alisci 2023-07-04 13:51:00 UTC
Description of problem:
it is not possible to allocate all the 20 GPU instances on a compute.
Actually it is running 19/20 of them. Spawning a new one more than that, it fails at the scheduled compute with the error:

Instance failed to spawn: nova.exception.ComputeResourcesUnavailable: Insufficient compute resources: vGPU resource is not available

checking the allocated mdev GPU devices, they seems to be the ones from the currently running instances and it seems there aren't any allocated and unused ones.

details on the following private commen

Version-Release number of selected component (if applicable):
OSP 16.2.3


How reproducible:
this is CU specific

Steps to Reproduce:
create instances with GPU

Actual results:
only 19/20 GPU get allocated

Expected results:
20/20 GPU get allocated


Note You need to log in before you can comment on or make changes to this bug.