Bug 2032247

Summary: [RFE] assign specific slots in the pci passthrough of GPU physical PCI devices
Product: Red Hat OpenStack Reporter: Chorong Park <chopark>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED WONTFIX QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16.1 (Train)CC: alifshit, dasmith, eglynn, jhakimra, kchamart, osp-dfg-compute, sbaker, sbauza, sgordon, smooney, vromanso, yocha
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-04 16:20:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chorong Park 2021-12-14 09:03:25 UTC
Description of problem:

When assigning a physical PCI device to an instance in the current PCI passthrough method, distinguishing the device is divided only into vendor_id and product_id.
Assuming that both vendor_id and product_id have two same GPU, if the first slot has to be assigned to an instance unconditionally, the randomness of the PCI passthrough has to repeat several times until the first slot is assigned.
Because of this, it takes a lot of time and cannot provide instance immediately. 
Therefore, even if it is the same PCI device, it is necessary to add a function to designate and assign specific slots in the pci passthrough of GPU physical PCI devices

Version-Release number of selected component (if applicable):

Red Hat OpenStack Platform 16.1


How reproducible:



Steps to Reproduce:
1.
2.
3.

Actual results:

It is impossible to assign specific slots in the pci passthrough of GPU physical PCI devices


Expected results:

Unique the pci passthrough device with name 

It is possible to assign specific slots in the pci passthrough of GPU physical PCI devices


Additional info:


there is no name tag and i cannot create an alias with a reference to an address for specific slots in the pci passthrough of GPU physical PCI devices

Comment 1 Artom Lifshitz 2021-12-15 16:32:05 UTC
Our documentation doesn't make it super clear, but it is possible to use the `address` specifier in the PCI passthrough configuration [1]. The upstream documentation looks clearer in that regard. Does that fulfill the customer's use case?

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-pci-passthrough_pci-passthrough#ref_guidelines-for-configuring-novapcipassthrough_pci-passthrough
[2] https://docs.openstack.org/nova/train/admin/pci-passthrough.html#configure-nova-compute-compute

Comment 4 smooney 2022-01-04 16:20:28 UTC
This is not something that we intend to support in nova now or in the future
The ability for a guest to select or even know about the existence of a GPU at a specific host hardware address is
a violation of the cloud abstraction provided by OpenStack.

This feature and some others similar request have been reject several times upstream 

https://github.com/openstack/nova/blob/master/doc/source/contributor/project-scope.rst#driver-parity

"""
Our goal for the Nova API is to provide a consistent abstraction to access on demand compute resources. We are not aiming to expose all features of all hypervisors. Where the details of the underlying hypervisor leak through our APIs, we have failed in this goal, and we must work towards better abstractions that are more interoperable.
"""

allowing the guest to request a device a a specific host address would be leaking the detail of the hypervisor though the apis.

Comment 5 Artom Lifshitz 2022-01-04 16:37:26 UTC
Everything Sean said is true, but I want to dig a bit deeper into the use case. Can you elaborate as to why the specific GPU device matters? In other words, what does the GPU in slot 0000:01:00.0 have that the one in slot 0000:02:00.0 doesn't have?