This bug was initially created as a copy of Bug #1729485 Description of problem: ------------------- Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ have a PCI address with domain 10000 that is greater than the configured maximum in nova. You get this error: PciConfigInvalidWhitelist: Invalid PCI devices Whitelist config: property domain (10000) is greater than the maximum allowable value (FFFF) Version-Release number of selected component (if applicable): RHOSP13 How reproducible: ------------------- Always reproducible Steps to Reproduce: ------------------- 1.Use Intel Corporation Ethernet Controller XL710 interfaces and configure them to be used for SR-IOV 2.Deployment will fail because of bug https://bugzilla.redhat.com/show_bug.cgi?id=1729439 but if you workaround it and continue you will find that sriov instances cannot be deployed. Actual results: ------------------- Nova cannot use these interfaces for SRIOV Expected results: ------------------- SR-IOV instances working with that interfaces Additional info: ------------------- In nova-scheduler you can see that PciPassthroughFilter returns 0 possible hosts, then on the compute host, checking the nova-compute logs you can see: [root@computehci-0 ~]# less /var/log/containers/nova/nova-compute.log ... ... 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager [req-d4e5eb11-d0f0-4ce1-ad63-0f020027fc58 - - - - -] Error updating resources for node computehci-0.rhosp.local.: PciConfigInvalidWhitelist: Invalid PCI devices Whitelist config: property domain (10000) is greater than the maximum allowable value (FFFF). 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager Traceback (most recent call last): 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7426, in update_available_resource_for_node 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 689, in update_available_resource 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager self._update_available_resource(context, resources) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager return f(*args, **kwargs) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 713, in _update_available_resource 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager self._init_compute_node(context, resources) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 571, in _init_compute_node 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager self._setup_pci_tracker(context, cn, resources) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 600, in _setup_pci_tracker 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager dev_json) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/pci/manager.py", line 120, in update_devices_from_hypervisor_resources 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager if self.dev_filter.device_assignable(dev): 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/pci/whitelist.py", line 91, in device_assignable 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager if spec.match(dev): 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 274, in match 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager address_obj = WhitelistPciAddress(address_str, pf) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 195, in __init__ 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager self._init_address_fields(pci_addr) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 216, in _init_address_fields 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager self.pci_address_spec = PhysicalPciAddress(pci_addr) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 87, in __init__ 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager self._set_pci_dev_info('domain', MAX_DOMAIN, '%04x') 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 66, in _set_pci_dev_info 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager {'property': prop, 'attr': a, 'max': maxval}) 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager PciConfigInvalidWhitelist: Invalid PCI devices Whitelist config: property domain (10000) is greater than the maximum allowable value (FFFF). 2019-07-12 12:42:25.009 1 ERROR nova.compute.manager In nova.conf this is the configuration: [root@computehci-0 ~]# vi /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf ... passthrough_whitelist={"devname":"enP65536p3s0f0","physical_network":"sriov"} ... I tried to change devname for addresses so I got the pci address with this command: [root@computehci-0 ~]# sudo lshw -c network -businfo | grep enP65536p3s0f0 pci@0000:03:00.0 enP65536p3s0f0 network Ethernet interface So I configured nova.conf in this way: passthrough_whitelist={"address":"0000:03:00.0","physical_network":"sriov"} It happens that it didn't work, so I double-checked the PCI address with another command: [root@computehci-2 ~]# lspci | grep 710 10000:03:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) 10000:03:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) There you can see the domain 10000. I configured that address in nova.conf: passthrough_whitelist={"address":"10000:03:00.0","physical_network":"sriov"} Then I got the same error, so devname maps the PCI correctly, the problem seems to be in devspec.py [1]. There you can see the function that is sending the error message along with the MAX_FUNCTION variable set to: MAX_DOMAIN = 0xFFFF In order to workaround it, I changed the variable (to MAX_DOMAIN = 0xFFFFF) in my compute nodes, in the file /usr/lib/python2.7/site-packages/nova/pci/devspec.py in the nova_compute container. [root@computehci-2 ~]# docker commit -m="Fix max domain in devspec.py" nova_compute new_nova_compute sha256:270fc8b566113975d355eece8c053889ee1e8d0b38683436fd2eff7a8665aba1 [root@computehci-2 ~]# docker tag new_nova_compute 172.30.0.2:8787/rhosp13/openstack-nova-compute [root@computehci-2 ~]# docker push 172.30.0.2:8787/rhosp13/openstack-nova-compute The push refers to a repository [172.30.0.2:8787/rhosp13/openstack-nova-compute] cefa3bc66d6f: Layer already exists ee8c602c858a: Layer already exists cf648748c4fe: Layer already exists c76ca73178da: Layer already exists fb15b60ae932: Layer already exists 050c734bd286: Layer already exists 13.0-87.1560797438: digest: sha256:8e8392b25325d9b98d4b06899b25165bd6e636c49994c4e976475f27468c6806 size: 1587 3a4748b9f150: Pushed cefa3bc66d6f: Layer already exists ee8c602c858a: Layer already exists cf648748c4fe: Layer already exists c76ca73178da: Layer already exists fb15b60ae932: Layer already exists 050c734bd286: Layer already exists latest: digest: sha256:a9c56e2332c140ce1ef19c276342f7168e34ba628863e9073278a1510a57a289 size: 1797 [root@computehci-2 ~]# docker restart nova_compute After restarting the services the error disappeared but I'm still not able to make SRIOV work (PciPassthroughFilter still returning 0 valid hosts) [1] https://github.com/openstack/nova/blob/stable/queens/nova/pci/devspec.py
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762