Bug 2155598
| Summary: | [RHOSP16.2.3] Instance schedule fail with Insufficient compute resources vGPU resource is not available | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Luigi Tamagnone <ltamagno> |
| Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
| Status: | CLOSED NOTABUG | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.2 (Train) | CC: | alifshit, dasmith, eglynn, jhakimra, jveiraca, kchamart, sbauza, sgordon, vromanso |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-11 16:34:56 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem: Cration of an instances with vGPU it fail for missing GPU resources, but there are 3 slots available. Version-Release number of selected component (if applicable): RHOSP16.2.3 Steps to Reproduce: 1. Creation of an instance with 1 vGPU fails 2. checking the resource seems there is some old vgpu not present on any instance ~~~ [root@overcloud-computegpu-1 devices]# find | grep 0000:3a | grep remove ./pci0000:36/0000:36:00.0/0000:37:00.0/0000:38:0c.0/0000:3a:00.0/372a7b6b-d83e-410b-8e03-7671c3ad19ea/remove [root@overcloud-computegpu-1 devices]# find | grep 0000:88 | grep remove ./pci0000:85/0000:85:00.0/0000:86:00.0/0000:87:00.0/0000:88:00.0/b56824d8-68d7-4ebc-9401-27f9c8f84992/remove ./pci0000:85/0000:85:00.0/0000:86:00.0/0000:87:00.0/0000:88:00.0/remove ./pci0000:85/0000:85:00.0/0000:86:00.0/0000:87:00.0/0000:88:00.0/c8d43c85-faa9-4884-8c13-950180c505e8/remove ~~~ 3. We tried to cleanup one resource: ~~~ [root@overcloud-computegpu-1 devices]# echo "1"> ./pci0000:36/0000:36:00.0/0000:37:00.0/0000:38:0c.0/0000:3a:00.0/372a7b6b-d83e-410b-8e03-7671c3ad19ea/remove ~~~ 4. but instance creation still fail Actual results: Instance fails with: ~~~ 2022-12-21 15:27:05.025 7 ERROR nova.compute.manager [req-a0169b9b-3ea2-4789-98ca-c2b05e8056ce 404b607ac74c4349b9b9c9f7edc1ccc3 70b860e1704d4accb6b4db9453d48daa - default default] [instance: 1e676579-abcf-414d-b6c3-773550bb42d0] Instance failed to spawn: nova.exception.ComputeResourcesUnavailable: Insufficient compute resources: vGPU resource is not available. ~~~ Expected results: Instance deployed successful Additional info: 3 slot are available for vGPU: ~~~ GPU 00000000:3A:00.0 Active vGPUs : 1 vGPU ID : 3251634342 VM UUID : 90b9f257-cfe1-48af-acaf-7a8af2ad4404 VM Name : instance-0000123d GPU 00000000:88:00.0 Active vGPUs : 0 ~~~