Bug 2155598 - [RHOSP16.2.3] Instance schedule fail with Insufficient compute resources vGPU resource is not available
Summary: [RHOSP16.2.3] Instance schedule fail with Insufficient compute resources vGPU...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.2 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-21 15:24 UTC by Luigi Tamagnone
Modified: 2023-03-21 20:01 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-11 16:34:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-20997 0 None None None 2022-12-21 15:32:25 UTC

Description Luigi Tamagnone 2022-12-21 15:24:32 UTC
Description of problem:
Cration of an instances with vGPU it fail for missing GPU resources, but there are 3 slots available.

Version-Release number of selected component (if applicable):
RHOSP16.2.3

Steps to Reproduce:
1. Creation of an instance with 1 vGPU fails
2. checking the resource seems there is some old vgpu not present on any instance
~~~
[root@overcloud-computegpu-1 devices]#  find | grep 0000:3a | grep remove
./pci0000:36/0000:36:00.0/0000:37:00.0/0000:38:0c.0/0000:3a:00.0/372a7b6b-d83e-410b-8e03-7671c3ad19ea/remove
[root@overcloud-computegpu-1 devices]#  find | grep 0000:88 | grep remove
./pci0000:85/0000:85:00.0/0000:86:00.0/0000:87:00.0/0000:88:00.0/b56824d8-68d7-4ebc-9401-27f9c8f84992/remove
./pci0000:85/0000:85:00.0/0000:86:00.0/0000:87:00.0/0000:88:00.0/remove
./pci0000:85/0000:85:00.0/0000:86:00.0/0000:87:00.0/0000:88:00.0/c8d43c85-faa9-4884-8c13-950180c505e8/remove
~~~
3. We tried to cleanup one resource:
~~~
[root@overcloud-computegpu-1 devices]# echo "1"> ./pci0000:36/0000:36:00.0/0000:37:00.0/0000:38:0c.0/0000:3a:00.0/372a7b6b-d83e-410b-8e03-7671c3ad19ea/remove
~~~
4. but instance creation still fail

Actual results:
Instance fails with:
~~~
2022-12-21 15:27:05.025 7 ERROR nova.compute.manager [req-a0169b9b-3ea2-4789-98ca-c2b05e8056ce 404b607ac74c4349b9b9c9f7edc1ccc3 70b860e1704d4accb6b4db9453d48daa - default default] [instance: 1e676579-abcf-414d-b6c3-773550bb42d0] Instance failed to spawn: nova.exception.ComputeResourcesUnavailable: Insufficient compute resources: vGPU resource is not available.
~~~

Expected results:
Instance deployed successful

Additional info:
3 slot are available for vGPU:
~~~
GPU 00000000:3A:00.0
    Active vGPUs                      : 1
    vGPU ID                           : 3251634342
        VM UUID                       : 90b9f257-cfe1-48af-acaf-7a8af2ad4404
        VM Name                       : instance-0000123d

GPU 00000000:88:00.0
    Active vGPUs                      : 0
~~~


Note You need to log in before you can comment on or make changes to this bug.