Description of problem: Instance creation with physical function is failing in PCI Passthrough setup Version-Release number of selected component (if applicable): RHOSP-10 How reproducible: Steps to Reproduce: Created a PCI-PT port with option "-binding:vnic_type direct-physical" and then create an instance with this port, it is failing with below error: ~~ | fault | {"message": "Insufficient compute resources: Requested instance NUMA topology together with requested PCI devices cannot fit the given host NUMA topology; Claim pci failed..", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 1783, in _do_build_and_run_instance | ~~~ Actual results: Should be able to spawn instance with PF access Expected results: Additional info:
The original error from comment #1 is a legit failure in the sense that it looks like they're asking for an instance NUMA topology and a PCI device, and the host can't fulfil that request. IIUC they have then changed their PCI passthrough configuration in nova.conf and have re-tried, leading to the error in comment #4. From what I can tell, that error isn't in the sosreports. I can see references to port ID 16c52d5c-abad-4160-b5a2-2f3feec2b08f, but I can see no errors. Would it be possible to "finalise", so to speak, the error that we're debugging, and once that's done attach sosreports that include it to this bz? Cheers!
Hi Artom, The initial issue is the final issue. The issue is seen because the nic device is seen as 'dev_type: type-PCI' . The issue is still seen after package update. Regards, Jaison R
I still believe that the failure is a legitimate error message, indicating that the compute host cannot fulfil the instance's requested NUMA topology and PCI devices. Would it be possible to have debug-level logs from nova-api and nova-scheduler as well? With those, I'd have a batter idea of what flavor, PCI devices, and NUMA topology the instance was booted with. Thanks! PS: On compute-16 at least, device_type is present in pci_passthrough_whitelist and no pci_alias is present: pci_passthrough_whitelist={"vendor_id":"1137","physical_network":"phys_pcie1_0","product_id":"0043","device_type":"type-PF","address":"0000:08:00.0"} device_type is only used in pci_alias, not pci_passthrough_whitelist [1]. This may explain why the PCI alias requested in the flavor (if there is one) isn't available on the compute host. [1] https://docs.openstack.org/newton/config-reference/compute/config-options.html#id29