Bug 1729485
Summary: | [OSP16.2] "Invalid PCI devices Whitelist config error" configuring passthrough_whitelist with new 40Gb NICs due domain in PCI address is greater than FFFF | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Luis Arizmendi <larizmen> |
Component: | openstack-nova | Assignee: | Stephen Finucane <stephenfin> |
Status: | CLOSED ERRATA | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 13.0 (Queens) | CC: | alifshit, dasmith, eglynn, jhakimra, kchamart, sbauza, sgordon, stephenfin, vromanso |
Target Milestone: | z2 | Keywords: | Patch, Triaged |
Target Release: | 16.2 (Train on RHEL 8.4) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-nova-20.6.2-2.20210616124810.b095ed1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-03-23 22:10:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1561658, 2054703 | ||
Bug Blocks: | 1961220 |
Description
Luis Arizmendi
2019-07-12 11:54:44 UTC
Since the previous test didn't work, I had to change the passthrough_whitelist to use the vendor_id. Then it started working, but using this method I cannot specify a single interface, but all with the same vendor_id: [root@computehci-2 ~]# grep -r '.*' /sys/class/net/*/device/vendor | grep enP65536p3s0f0 /sys/class/net/enP65536p3s0f0/device/vendor:0x8086 [root@computehci-0 ~]# vi /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf .. .. #passthrough_whitelist={"devname":"enP65536p3s0f0","physical_network":"sriov"} passthrough_whitelist={"vendor_id":"8086","physical_network":"sriov"} .. [root@computehci-0 ~]# docker restart nova_compute Actually only the scheduling is working, but the VM cannot start, probably because it is choosing a wrong interface, not the one that I want since I get this error: "Interface type hostdev is currently supported on SR-IOV Virtual Functions only" Probably related to https://bugzilla.redhat.com/show_bug.cgi?id=1561658 devname should be avoided in general as it is unreliable and we are currently in the process of deprecating it upstream https://review.opendev.org/#/c/670585/2 the libvirt issue seams valid and i suspect the nova limitation that was introduced as part of https://github.com/openstack/nova/commit/eca4286e955861e8e1547a8aabf2c4b5c4aad075 was chosen to be 2 bytes instead of 4 due to libvirt. i think it is resonable for nova to allow 32bit domains however if the libvirt limitation still exists it will just fail later when the vm tries to boot. passthrough_whitelist={"vendor_id":"8086","physical_network":"sriov"} is dangours as it would allow any intel pci device on the plathform to be used not jsut nics you should have set teh vendror_id and product_id can your trie usign the vendor_id and product_id again? you can find those with lspci -vvn it will print as vendor_id:product_id in the output can you confim two things for me first if you manually try to boot with he desired nic using livbrit can libvirt process the request and create a vm. second can you check if the nic is attached to the second or 3/4th socket on the host? if you have a multi socket host the domain will be non 0 on all sockets other then the first. if you have more then 2 sockets on this host it could result in it being outside the 16bit range. as another workaround you could try moving the nic to a different slot. (In reply to smooney from comment #4) > devname should be avoided in general as it is unreliable and we are > currently in the process of deprecating it upstream > https://review.opendev.org/#/c/670585/2 > the libvirt issue seams valid and i suspect the nova limitation that was > introduced as part of > https://github.com/openstack/nova/commit/ > eca4286e955861e8e1547a8aabf2c4b5c4aad075 > was chosen to be 2 bytes instead of 4 due to libvirt. > > i think it is resonable for nova to allow 32bit domains however if the > libvirt limitation still exists it will just fail later when the vm tries to > boot. > > passthrough_whitelist={"vendor_id":"8086","physical_network":"sriov"} is > dangours as it would allow any intel pci device on the plathform to be used > not jsut nics > you should have set teh vendror_id and product_id > can your trie usign the vendor_id and product_id again? > > you can find those with lspci -vvn > > it will print as vendor_id:product_id in the output > > > can you confim two things for me > first if you manually try to boot with he desired nic using livbrit can > libvirt process the request and create a vm. > second can you check if the nic is attached to the second or 3/4th socket on > the host? > > if you have a multi socket host the domain will be non 0 on all sockets > other then the first. > if you have more then 2 sockets on this host it could result in it being > outside the 16bit range. > as another workaround you could try moving the nic to a different slot. Libvirt is not able to create the vm. Regarding the slots, we asked Intel about that, because lspci shows that not all devices have the domain starting with 1, so probably it's what you said, that changing the NIC to another slot "solves" the issue here, but I was wondering if that libvirt limitation could eventually be solved or if that's something that it's not in the roadmap This is currently blocked because libvirt/kvm doesn't support 32bit PCI domains, but if that is fixed we shouldn't block this config in Nova. We probably also shouldn't block this for other hypervisors. Just as a comment, while this bug is being solved, if you can do it in your environment, you can turn off the Intel Volume Management Device service in BIOS, that will remove the domain ID number 1000. In that case, all PCI address starts with 0000 and you don't find the issue [1] with the interface name including capital letters either [1] https://bugzilla.redhat.com/show_bug.cgi?id=1729439 As noted in comment 6, this is a limitation of libvirt. With the introduction of [1], nova will now ignore PCI devices with 32 bit domains. It was never possible to specify a PCI address with a 32 bit domain and this limitation persists. [1] https://github.com/openstack/nova/commit/8c9d6fc8f073cde78b79ae259c9915216f5d59b0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.2), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1001 |