Bug 2301525 - Sometimes instances with PCI passthrough are created with more PCI devices than requested
Summary: Sometimes instances with PCI passthrough are created with more PCI devices th...
Keywords:
Status: CLOSED DUPLICATE of bug 2301551
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.2 (Train)
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-07-30 08:43 UTC by Alex Stupnikov
Modified: 2024-12-11 15:50 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-08-10 13:59:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1860555 0 None None None 2024-07-31 13:42:29 UTC
OpenStack gerrit 710848 0 None NEW Fix PCI passthrough race on reschedule (refresh) 2024-08-06 15:27:38 UTC
Red Hat Issue Tracker OSP-32584 0 None None None 2024-07-30 08:45:10 UTC
Red Hat Knowledge Base (Solution) 7081142 0 None None None 2024-07-31 13:46:22 UTC

Description Alex Stupnikov 2024-07-30 08:43:45 UTC
Description of problem:
Two instances in customer's deployment have two PCI devices (GPUs) attached instead of 1 (as prescribed by a flavor). This situation causes scheduling problems and looks similar to https://bugs.launchpad.net/nova/+bug/1860555.

We are looking for a workaround for this problem (it will be nice to have it ASAP): in customer's deployment compute nodes are quite packed, so ideally some solution that doesn't require migration is needed.

Information about collected data will be provided privately.


Version-Release number of selected component (if applicable): RHOSP 16.2, but newer releases are affected as well.


How reproducible: in customer's deployment this was likely triggered by failed host evacuations when multiple VMs were scheduled on the same compute and then re-scheduled. Upstream bug has different steps that may be better for lab.


Actual results:
Sometimes, instances may fail during creation, or may be created with more PCI devices than requested.


Expected results:
The instances are created successfully, and each have the expected number of PCI devices attached.



Additional info: to be provided privately

Comment 4 Artom Lifshitz 2024-08-10 13:59:02 UTC

*** This bug has been marked as a duplicate of bug 2301551 ***


Note You need to log in before you can comment on or make changes to this bug.