Description of problem: update_available_resource fails because some PCI devices have invalid XML format [1]. This situation is caused by VPD NICs having the following field in their dumpxml: <vendor_field index='Z'>6<1</vendor_field> It looks like a bug in libvirt, so I am reporting it to engineering to get help ASAP: many more customers can be potentially affected because we have many ongoing upgrade processes. [1] 2024-01-19 10:06:52.927 2 DEBUG nova.compute.resource_tracker [req-49fd3ccf-12b6-405d-8c9d-985ae1184e27 - - - - -] Auditing locally available compute resources for compute.example.com (node: compute.example.com) update_available_resource /usr/lib/python3.9/site-packages/nova/compute/resource_tracker.py:880 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager [req-49fd3ccf-12b6-405d-8c9d-985ae1184e27 - - - - -] Error updating resources for node compute.example.com.: File "<string>", line 40 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager Traceback (most recent call last): 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 10008, in _update_available_resource_for_node 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager self.rt.update_available_resource(context, nodename, 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "/usr/lib/python3.9/site-packages/nova/compute/resource_tracker.py", line 884, in update_available_resource 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename) 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 9163, in get_available_resource 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager data['pci_passthrough_devices'] = self._get_pci_passthrough_devices() 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7816, in _get_pci_passthrough_devices 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager pci_info = [ 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7817, in <listcomp> 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager self._host._get_pcidev_info(name, dev, net_devs, vdpa_devs) 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/host.py", line 1319, in _get_pcidev_info 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager cfgdev.parse_str(xmlstr) 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/config.py", line 73, in parse_str 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager self.parse_dom(etree.fromstring(xmlstr)) 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "src/lxml/parser.pxi", line 1899, in lxml.etree._parseMemoryDocument 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "src/lxml/parser.pxi", line 1780, in lxml.etree._parseDoc 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "src/lxml/parser.pxi", line 1085, in lxml.etree._BaseParser._parseUnicodeDoc 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "src/lxml/parser.pxi", line 618, in lxml.etree._ParserContext._handleParseResultDoc 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "src/lxml/parser.pxi", line 728, in lxml.etree._handleParseResult 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "src/lxml/parser.pxi", line 657, in lxml.etree._raiseParseError 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager File "<string>", line 40 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager lxml.etree.XMLSyntaxError: StartTag: invalid element name, line 40, column 35 2024-01-19 10:06:54.733 2 ERROR nova.compute.manager Version-Release number of selected component (if applicable): RHOSP 17.1 rhosp-rhel9/openstack-nova-libvirt 17.1 sha256:bf5310f5839bd8648fc9a26d17e2d57230ca25baa27bd86c9315688102d4ad95 207e3ea7ed6e 7 weeks ago 1.54 GB How reproducible: This problem is hardware-dependant and is reproduced without extra steps when nova-compute is started on problematic hardware Actual results: Libvirt XML dumps have invalid format, Nova update_available_resource is blocked. Expected results: Libvirt XML dumps are valid (bug in libvirt), if XML dump has invalid format, then Nova reports errors, but its operations are not blocked (space for improvement on Nova side). Additional info: Bug #2259636 was reported to request improved logging for problematic XML dumps.
I've adjusted the bug title to a "QE Tracker" as we're dependent on the libvirt bug here. Here's the libvirt (publicly) accessible issue: https://issues.redhat.com/browse/RHEL-22314 — libvirt failing to parse PCI device VPD (virtual private data) for some hardware
this was backported all the way to rhel 9.0 in January https://issues.redhat.com/browse/RHEL-22398 and it was released in 9.2 as well in march 2024/03/05 https://issues.redhat.com/browse/RHEL-22399 the fixed in build was libvirt-9.0.0-10.4.el9_2 the current lastes 17.1.2 tag is 17.1.2-5.1712881171 and it contained libvirt-daemon-9.0.0-10.5.el9_2.x86_64 so closing this as current release as is was shipped to the cdn by the automatic container rebuilds when the rhel release was made to the cdn. its also available in 17.1.2-5.1709836652, and 17.1.2-5.1709628728