Bug 1576404 - Instance creation with physical function is failing in PCI Passthrough setup
Summary: Instance creation with physical function is failing in PCI Passthrough setup
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-09 11:38 UTC by Gyanendra Kumar
Modified: 2023-03-21 18:49 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-25 05:18:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-9003 0 None None None 2022-08-09 10:49:34 UTC

Description Gyanendra Kumar 2018-05-09 11:38:20 UTC
Description of problem:   Instance creation with physical function  is failing in PCI Passthrough setup

Version-Release number of selected component (if applicable):

RHOSP-10

How reproducible:


Steps to Reproduce:

Created a PCI-PT port with option "-binding:vnic_type direct-physical" and  then   create an instance  with this port, it is failing with below error:

~~

| fault                                | {"message": "Insufficient compute resources: Requested instance NUMA topology together with requested PCI devices cannot fit the given host NUMA topology; Claim pci failed..", "code": 500, "details": "  File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 1783, in _do_build_and_run_instance |
~~~

Actual results:

Should be able to spawn instance with PF access

Expected results:


Additional info:

Comment 5 Artom Lifshitz 2018-05-11 00:18:06 UTC
The original error from comment #1 is a legit failure in the sense that it looks like they're asking for an instance NUMA topology and a PCI device, and the host can't fulfil that request.

IIUC they have then changed their PCI passthrough configuration in nova.conf and have re-tried, leading to the error in comment #4. From what I can tell, that error isn't in the sosreports. I can see references to port ID 16c52d5c-abad-4160-b5a2-2f3feec2b08f, but I can see no errors.

Would it be possible to "finalise", so to speak, the error that we're debugging, and once that's done attach sosreports that include it to this bz?

Cheers!

Comment 6 Jaison Raju 2018-05-16 11:04:38 UTC
Hi Artom,

The initial issue is the final issue.
The issue is seen because the nic device is seen as 'dev_type: type-PCI' .
The issue is still seen after package update.

Regards,
Jaison R

Comment 12 Artom Lifshitz 2018-05-18 20:56:57 UTC
I still believe that the failure is a legitimate error message, indicating that the compute host cannot fulfil the instance's requested NUMA topology and PCI devices. Would it be possible to have debug-level logs from nova-api and nova-scheduler as well? With those, I'd have a batter idea of what flavor, PCI devices, and NUMA topology the instance was booted with.

Thanks!

PS: On compute-16 at least, device_type is present in pci_passthrough_whitelist and no pci_alias is present:

pci_passthrough_whitelist={"vendor_id":"1137","physical_network":"phys_pcie1_0","product_id":"0043","device_type":"type-PF","address":"0000:08:00.0"}                                                             


device_type is only used in pci_alias, not pci_passthrough_whitelist [1]. This may explain why the PCI alias requested in the flavor (if there is one) isn't available on the compute host.

[1] https://docs.openstack.org/newton/config-reference/compute/config-options.html#id29


Note You need to log in before you can comment on or make changes to this bug.