Bug 2301525

Summary: Sometimes instances with PCI passthrough are created with more PCI devices than requested
Product: Red Hat OpenStack Reporter: Alex Stupnikov <astupnik>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED DUPLICATE QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16.2 (Train)CC: alifshit, dasmith, eglynn, jhakimra, kchamart, parthee, sbauza, sgordon, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-08-10 13:59:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Stupnikov 2024-07-30 08:43:45 UTC
Description of problem:
Two instances in customer's deployment have two PCI devices (GPUs) attached instead of 1 (as prescribed by a flavor). This situation causes scheduling problems and looks similar to https://bugs.launchpad.net/nova/+bug/1860555.

We are looking for a workaround for this problem (it will be nice to have it ASAP): in customer's deployment compute nodes are quite packed, so ideally some solution that doesn't require migration is needed.

Information about collected data will be provided privately.


Version-Release number of selected component (if applicable): RHOSP 16.2, but newer releases are affected as well.


How reproducible: in customer's deployment this was likely triggered by failed host evacuations when multiple VMs were scheduled on the same compute and then re-scheduled. Upstream bug has different steps that may be better for lab.


Actual results:
Sometimes, instances may fail during creation, or may be created with more PCI devices than requested.


Expected results:
The instances are created successfully, and each have the expected number of PCI devices attached.



Additional info: to be provided privately

Comment 4 Artom Lifshitz 2024-08-10 13:59:02 UTC

*** This bug has been marked as a duplicate of bug 2301551 ***