Bug 1724999
Summary: | Device role tagging doesn't work for SRIOV PF | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Franck Baudin <fbaudin> | ||||
Component: | openstack-nova | Assignee: | René Ribaud <rribaud> | ||||
Status: | CLOSED MIGRATED | QA Contact: | OSP DFG:Compute <osp-dfg-compute> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 13.0 (Queens) | CC: | alifshit, cfontain, dasmith, egallen, eglynn, jhakimra, kchamart, lyarwood, rribaud, sbauza, sgordon, sputhenp, supadhya, vromanso | ||||
Target Milestone: | --- | Keywords: | Patch, Triaged, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2025-01-17 15:34:12 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Franck Baudin
2019-06-28 08:48:45 UTC
Artom, I can't quite parse (In reply to Franck Baudin from comment #0) > Description of problem: > > nova device role tagging works well for SRIOV PF or OVS-ML2 but doesn't work > with SRIOV PF. Frank, I think there's typo in there, as the above description does not make sense. And there's not much logging info besides the command-line output you've posted. And I can't tell the reason based on the info you've provided. Probably we need to see the Nova API logs. That said, "device tagging" makes me think of Artom, let's ask him if something jumps out at him. Artom? [...] Yeah, we need sosreports with the nova-compute logs at debug level as well as the XML of the instance in question. (In reply to Kashyap Chamarthy from comment #1) > Artom, I can't quite parse (In reply to Franck Baudin from comment #0) > > Description of problem: > > > > nova device role tagging works well for SRIOV PF or OVS-ML2 but doesn't work > > with SRIOV PF. > > Frank, I think there's typo in there, as the above description does not make > sense. yes, sorry for that: nova device role tagging works well for SRIOV VF or OVS-ML2 but doesn't work with SRIOV PF. > > And there's not much logging info besides the command-line output you've > posted. And I can't tell the reason based on the info you've provided. > Probably we need to see the Nova API logs. > > That said, "device tagging" makes me think of Artom, let's ask him if > something jumps out at him. > > Artom? > > [...] (In reply to Artom Lifshitz from comment #2) > Yeah, we need sosreports with the nova-compute logs at debug level as well > as the XML of the instance in question. My lab is unfortunately used for other purposes at the moment (TripleO & Neutron debug), can I reach you to provide you an access next time it's deployed? Or if you have a lab with an SR-IOV PF interfaces, I can replicate the issue on your lab, this is a matter of 10 mins. Thanks! (In reply to Franck Baudin from comment #4) > (In reply to Artom Lifshitz from comment #2) > > Yeah, we need sosreports with the nova-compute logs at debug level as well > > as the XML of the instance in question. > > My lab is unfortunately used for other purposes at the moment (TripleO & > Neutron debug), can I reach you to provide you an access next time it's > deployed? That works. > Or if you have a lab with an SR-IOV PF interfaces, I can replicate > the issue on your lab, this is a matter of 10 mins. Thanks! I unfortunately do not. Cheers! Created attachment 1589536 [details]
compute node sos report, including nova logs
Looks like this has to do with how Nova calculates PCI addresses for the PFs: Here's the relevant code: https://github.com/openstack/nova/blob/stable/queens/nova/virt/libvirt/driver.py#L8649-L8654 Here's the relevant log lines: 2019-07-11 17:31:49.117 1 DEBUG nova.virt.libvirt.driver [req-9107df9c-5e7a-492d-a962-1e4cc237735a 587e061a01d74ec6aace10089b7c443c 0dde3f263d0249849aae536e406d1355 - default default] Not ex posing metadata for not found PCI device 0000:86:00.0 _build_hostdev_metadata /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:8638 2019-07-11 17:31:49.118 1 DEBUG nova.virt.libvirt.driver [req-9107df9c-5e7a-492d-a962-1e4cc237735a 587e061a01d74ec6aace10089b7c443c 0dde3f263d0249849aae536e406d1355 - default default] Not ex posing metadata for not found PCI device 0000:86:00.1 _build_hostdev_metadata /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:8638 So Nova expects the PFs to be at PCI addresses 0000:86:00.0 and 0000:86:00.1. However, looking at lspci output on the host, here's what the PFs looks like: 86:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) 86:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01 So I suspect the leading 00s difference is why Nova thinks the PFs "don't exist" and therefore doesn't generate metadata for them. Further analysis makes me think that the problematic code is actually https://github.com/openstack/nova/blob/stable/queens/nova/pci/utils.py#L145 The path that _get_sysfs_netdev_path() returns would be `/sys/bus/pci/devices/0000\:86\:00.1/physfn/net` - however that doesn't exist on the compute host. So os.listdir raises an exception, this gets caught and reraised as PciDeviceNotFoundById, which causes us to skip that interface's metadata. This is where my knowledge of SRIOV breaks down a bit. Why would /sys/bus/pci/devices/0000\:86\:00.1/physfn/net not exist? Is _get_sysfs_netdev_path() making wrong assumptions? Or is it device-dependant? I have a patch upstream for this, but I'm currently concentrating on the NUMA live migration RFE, so I'm not actively working on this. Moving back to ASSIGNED. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |