Bug 1961220 - [OSP16.1] "Invalid PCI devices Whitelist config error" configuring passthrough_whitelist with new 40Gb NICs due domain in PCI address is greater than FFFF
Summary: [OSP16.1] "Invalid PCI devices Whitelist config error" configuring passthrou...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z7
: 16.1 (Train on RHEL 8.2)
Assignee: Stephen Finucane
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On: 1729485
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-17 14:42 UTC by Stephen Finucane
Modified: 2023-03-21 19:43 UTC (History)
7 users (show)

Fixed In Version: openstack-nova-20.4.1-1.20210528141851.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-09 20:19:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-3924 0 None None None 2021-11-18 11:32:27 UTC
Red Hat Product Errata RHBA-2021:3762 0 None None None 2021-12-09 20:19:55 UTC

Description Stephen Finucane 2021-05-17 14:42:45 UTC
This bug was initially created as a copy of Bug #1729485

Description of problem:
-------------------
Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ have a PCI address with domain 10000 that is greater than the configured maximum  in nova. You get this error:

PciConfigInvalidWhitelist: Invalid PCI devices Whitelist config: property domain (10000) is greater than the maximum allowable value (FFFF)



Version-Release number of selected component (if applicable): RHOSP13


How reproducible:
-------------------
Always reproducible


Steps to Reproduce:
-------------------
1.Use Intel Corporation Ethernet Controller XL710 interfaces and configure them to be used for SR-IOV
2.Deployment will fail because of bug https://bugzilla.redhat.com/show_bug.cgi?id=1729439  but if you workaround it and continue you will find that sriov instances cannot be deployed.


Actual results:
-------------------
Nova cannot use these interfaces for SRIOV


Expected results:
-------------------
SR-IOV instances working with that interfaces



Additional info:
-------------------

In nova-scheduler you can see that PciPassthroughFilter returns 0 possible hosts, then on the compute host, checking the nova-compute logs you can see:

[root@computehci-0 ~]# less /var/log/containers/nova/nova-compute.log
...
...
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager [req-d4e5eb11-d0f0-4ce1-ad63-0f020027fc58 - - - - -] Error updating resources for node computehci-0.rhosp.local.: PciConfigInvalidWhitelist: Invalid PCI devices Whitelist config: property domain (10000) is greater than the maximum allowable value (FFFF).
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager Traceback (most recent call last):
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7426, in update_available_resource_for_node
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 689, in update_available_resource
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     self._update_available_resource(context, resources)
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     return f(*args, **kwargs)
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 713, in _update_available_resource
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     self._init_compute_node(context, resources)
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 571, in _init_compute_node
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     self._setup_pci_tracker(context, cn, resources)
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 600, in _setup_pci_tracker
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     dev_json)
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/pci/manager.py", line 120, in update_devices_from_hypervisor_resources
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     if self.dev_filter.device_assignable(dev):
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/pci/whitelist.py", line 91, in device_assignable
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     if spec.match(dev):
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 274, in match
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     address_obj = WhitelistPciAddress(address_str, pf)
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 195, in __init__
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     self._init_address_fields(pci_addr)
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 216, in _init_address_fields
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     self.pci_address_spec = PhysicalPciAddress(pci_addr)
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 87, in __init__
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     self._set_pci_dev_info('domain', MAX_DOMAIN, '%04x')
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/pci/devspec.py", line 66, in _set_pci_dev_info
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager     {'property': prop, 'attr': a, 'max': maxval})
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager PciConfigInvalidWhitelist: Invalid PCI devices Whitelist config: property domain (10000) is greater than the maximum allowable value (FFFF).
2019-07-12 12:42:25.009 1 ERROR nova.compute.manager 



In nova.conf this is the configuration:

[root@computehci-0 ~]# vi /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf 

...
passthrough_whitelist={"devname":"enP65536p3s0f0","physical_network":"sriov"}
...


I tried to change devname for addresses so I got the pci address with this command:


[root@computehci-0 ~]# sudo lshw -c network -businfo | grep enP65536p3s0f0
pci@0000:03:00.0  enP65536p3s0f0  network        Ethernet interface

So I configured nova.conf in this way:

passthrough_whitelist={"address":"0000:03:00.0","physical_network":"sriov"}


It happens that it didn't work, so I double-checked the PCI address with another command:


[root@computehci-2 ~]# lspci | grep 710
10000:03:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
10000:03:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)


There you can see the domain 10000. I configured that address in nova.conf:

passthrough_whitelist={"address":"10000:03:00.0","physical_network":"sriov"}


Then I got the same error, so devname maps the PCI correctly, the problem seems to be in devspec.py [1]. There you can see the function that is sending the error message along with the MAX_FUNCTION variable set to:

MAX_DOMAIN = 0xFFFF


In order to workaround it, I changed the variable (to MAX_DOMAIN = 0xFFFFF) in my compute nodes, in the file /usr/lib/python2.7/site-packages/nova/pci/devspec.py in the nova_compute container. 

[root@computehci-2 ~]# docker commit -m="Fix max domain in devspec.py" nova_compute new_nova_compute                                                                                                                                  
sha256:270fc8b566113975d355eece8c053889ee1e8d0b38683436fd2eff7a8665aba1

[root@computehci-2 ~]# docker tag new_nova_compute 172.30.0.2:8787/rhosp13/openstack-nova-compute

[root@computehci-2 ~]# docker push 172.30.0.2:8787/rhosp13/openstack-nova-compute
The push refers to a repository [172.30.0.2:8787/rhosp13/openstack-nova-compute]
cefa3bc66d6f: Layer already exists 
ee8c602c858a: Layer already exists 
cf648748c4fe: Layer already exists 
c76ca73178da: Layer already exists 
fb15b60ae932: Layer already exists 
050c734bd286: Layer already exists 
13.0-87.1560797438: digest: sha256:8e8392b25325d9b98d4b06899b25165bd6e636c49994c4e976475f27468c6806 size: 1587
3a4748b9f150: Pushed 
cefa3bc66d6f: Layer already exists 
ee8c602c858a: Layer already exists 
cf648748c4fe: Layer already exists 
c76ca73178da: Layer already exists 
fb15b60ae932: Layer already exists 
050c734bd286: Layer already exists 
latest: digest: sha256:a9c56e2332c140ce1ef19c276342f7168e34ba628863e9073278a1510a57a289 size: 1797

[root@computehci-2 ~]# docker restart nova_compute



After restarting the services the error disappeared but I'm still not able to make SRIOV work (PciPassthroughFilter still returning 0 valid hosts)



[1] https://github.com/openstack/nova/blob/stable/queens/nova/pci/devspec.py

Comment 20 errata-xmlrpc 2021-12-09 20:19:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762


Note You need to log in before you can comment on or make changes to this bug.