Bug 1724999 - Device role tagging doesn't work for SRIOV PF
Summary: Device role tagging doesn't work for SRIOV PF
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Artom Lifshitz
QA Contact: nova-maint
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-28 08:48 UTC by Franck Baudin
Modified: 2021-04-19 10:25 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
compute node sos report, including nova logs (13.87 MB, application/x-xz)
2019-07-11 14:27 UTC, Franck Baudin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1836389 0 None None None 2019-07-12 16:07:07 UTC
OpenStack gerrit 670593 0 'None' NEW Device tags: don't pass pf_interface=True to get_mac_by_pci_address 2020-11-09 13:19:44 UTC

Description Franck Baudin 2019-06-28 08:48:45 UTC
Description of problem:

nova device role tagging works well for SRIOV PF or OVS-ML2 but doesn't work with SRIOV PF.


Version-Release number of selected component (if applicable): RHOSP13.z7


How reproducible: 100%


Steps to Reproduce:

nova boot TRex --flavor vnfc --image testpmd --nic net-id=8fe3eb35-4eb4-4a9a-9eaf-b97708fef451,tag=mgmt --config-drive True --key-name undercloud --nic port-id=6dd3b82f-ce2f-44dd-acd0-62b922a7281a,tag=east 

openstack port show 6dd3b82f-ce2f-44dd-acd0-62b922a7281a
+-----------------------+-------------------------------------------------------------------------------+
| Field                 | Value                                                                         |
+-----------------------+-------------------------------------------------------------------------------+
| admin_state_up        | UP                                                                            |
| allowed_address_pairs |                                                                               |
| binding_host_id       | computeovsdpdk-0.localdomain                                                  |
| binding_profile       | pci_slot='0000:86:00.0', pci_vendor_info='8086:1572', physical_network='east' |
| binding_vif_details   | port_filter='False', vlan='0'                                                 |
| binding_vif_type      | hostdev_physical                                                              |
| binding_vnic_type     | direct-physical                                                               |
| created_at            | 2019-06-28T07:38:32Z                                                          |
| data_plane_status     | None                                                                          |
| description           |                                                                               |
| device_id             | 373fb960-27f1-4226-a561-b9c1ccdf48af                                          |
| device_owner          | compute:nova                                                                  |
| dns_assignment        | None                                                                          |
| dns_name              | None                                                                          |
| extra_dhcp_opts       |                                                                               |
| fixed_ips             | ip_address='10.230.0.8', subnet_id='b691b899-91f8-44e7-b6cd-be6fea57bec8'     |
| id                    | 6dd3b82f-ce2f-44dd-acd0-62b922a7281a                                          |
| ip_address            | None                                                                          |
| mac_address           | 3c:fd:fe:b0:0c:0c                                                             |
| name                  | east_0                                                                        |
| network_id            | 23fce24d-88a8-4212-8300-78f3d63681c2                                          |
| option_name           | None                                                                          |
| option_value          | None                                                                          |
| port_security_enabled | True                                                                          |
| project_id            | d06ebdf349f84b3e91a98700d726c070                                              |
| qos_policy_id         | None                                                                          |
| revision_number       | 15                                                                            |
| security_group_ids    | f99c932c-62dc-4739-a976-9ed767089aa7                                          |
| status                | ACTIVE                                                                        |
| subnet_id             | None                                                                          |
| tags                  | east                                                                          |
| trunk_details         | None                                                                          |
| updated_at            | 2019-06-28T08:40:34Z                                                          |
+-----------------------+-------------------------------------------------------------------------------+

in the VM, we don't see the associated metadata to the SRIOV PF device

[root@trex ~]# mount /dev/cdrom /mnt/
mount: /dev/sr0 is write-protected, mounting read-only
[root@trex ~]# cd /mnt/openstack/latest/
[root@trex latest]# jq . meta_data.json 
{
  "admin_pass": "VBtGjJBSW6AF",
  "random_seed": "7ApEOT60qJf2GrHVEI2LNIvN0c622CYRiou5XGJiCrSpAs8i10aMbf18ABo7ZMFCchZ6mxPaRcDKFE4s17Q9y+9wj9ktkMZwgIYm8ZAIr6AUWuWSHEnDPE026TtZQVYB1CShCf8U6xY0EJhxaD+PywHO7NjyDbWswtJwiqECjrlmdmmnw3LLDmUczgwsUfGAPhk8Ju+z8Aj3zGkXn9G8XPN6JGF9PxAS3Zk4iSS8q1Bue+I+pTT/d5ZCL/CNNfUC0QvrrHaMM8FaR26LH6xhVfh+2qKWTVNLO0E+hnjIf3ztnkQ4zICfiDaPhwXANIVooS+ccj82aUh1s4No+RVflRDQQHjE050/DQHU1dIAcKtcTOQc+EvNmNAyWmuwnIxbo7In84F9zn4x8H9WOiSJGpBWDH6XkMXaKtCNy3W7UjcUny4NxxB16Gg3sme4rVMr2UVColEEDga3qPYp6LcsR7zpRv1cG1x8wwyKH7wuMXTjwCXOFaS/y8gGiYYsCD2QjccQHEeWfHqPBTv76QGAZYiH0t9B9uXaAvml+H9jWzIyRRdSnEQeWiWdCdOxyzXkui4e2hy2cXsbO4vKx3rg2xPFm0weiJMeJ1n08+hDTeWVT2N8c3Y0vMRS59Tkk/nmbx8GRl1kKQfSZVbfY55dS7rWjySdB4PcugmuqfA6VqY=",
  "uuid": "373fb960-27f1-4226-a561-b9c1ccdf48af",
  "availability_zone": "nova",
  "keys": [
    {
      "data": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDS1atbBB05pfaM7lOs0VUBJPNrlx8xzzlOTBH49v//p2aVWOTN9F3nzCbM1uLthW0wuzQAO5FQy3QLj+Xfn6+k8CaQnm2vB7gwK8uxb7dglHzKywOAPjHYQl/AQA9jrbmbY2bIRZHKV0dJ06gv4tf0BfvCE4gaLFIAvfSzX9G8muGWqAyoEKwEYS8Es7kWf/Dq2LonUpsahAP7WB5f5GNPQGKhH30SnaNqaGhB75QSbSUPA6Mdt01fdwkd4TKjaM/P/ty7GnMbx9trQdgkvym08alYJC3WpAdEnaCJSDuidUSiDgEfHxkeOQxvIBXnxZyWsg95QZRXEQjcZpmmqQGD stack@undercloud13.redhat.local\n",
      "type": "ssh",
      "name": "undercloud"
    }
  ],
  "hostname": "trex",
  "launch_index": 0,
  "devices": [
    {
      "bus": "pci",
      "mac": "fa:16:3e:21:8a:d7",
      "tags": [
        "mgmt"
      ],
      "type": "nic",
      "address": "0000:00:03.0"
    }
  ],
  "public_keys": {
    "undercloud": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDS1atbBB05pfaM7lOs0VUBJPNrlx8xzzlOTBH49v//p2aVWOTN9F3nzCbM1uLthW0wuzQAO5FQy3QLj+Xfn6+k8CaQnm2vB7gwK8uxb7dglHzKywOAPjHYQl/AQA9jrbmbY2bIRZHKV0dJ06gv4tf0BfvCE4gaLFIAvfSzX9G8muGWqAyoEKwEYS8Es7kWf/Dq2LonUpsahAP7WB5f5GNPQGKhH30SnaNqaGhB75QSbSUPA6Mdt01fdwkd4TKjaM/P/ty7GnMbx9trQdgkvym08alYJC3WpAdEnaCJSDuidUSiDgEfHxkeOQxvIBXnxZyWsg95QZRXEQjcZpmmqQGD stack@undercloud13.redhat.local\n"
  },
  "project_id": "d06ebdf349f84b3e91a98700d726c070",
  "name": "TRex"
}


Expected results:
Get the metadata corresponding to SRIOV PF ports

Comment 1 Kashyap Chamarthy 2019-07-05 12:56:13 UTC
Artom, I can't quite parse (In reply to Franck Baudin from comment #0)
> Description of problem:
> 
> nova device role tagging works well for SRIOV PF or OVS-ML2 but doesn't work
> with SRIOV PF.

Frank, I think there's typo in there, as the above description does not make sense.

And there's not much logging info besides the command-line output you've posted.  And I can't tell the reason based on the info you've provided.  Probably we need to see the Nova API logs.

That said, "device tagging" makes me think of Artom, let's ask him if something jumps out at him.

Artom?

[...]

Comment 2 Artom Lifshitz 2019-07-05 13:09:34 UTC
Yeah, we need sosreports with the nova-compute logs at debug level as well as the XML of the instance in question.

Comment 3 Franck Baudin 2019-07-05 13:21:23 UTC
(In reply to Kashyap Chamarthy from comment #1)
> Artom, I can't quite parse (In reply to Franck Baudin from comment #0)
> > Description of problem:
> > 
> > nova device role tagging works well for SRIOV PF or OVS-ML2 but doesn't work
> > with SRIOV PF.
> 
> Frank, I think there's typo in there, as the above description does not make
> sense.

yes, sorry for that: nova device role tagging works well for SRIOV VF or OVS-ML2 but doesn't work with SRIOV PF.

> 
> And there's not much logging info besides the command-line output you've
> posted.  And I can't tell the reason based on the info you've provided. 
> Probably we need to see the Nova API logs.
> 
> That said, "device tagging" makes me think of Artom, let's ask him if
> something jumps out at him.
> 
> Artom?
> 
> [...]

Comment 4 Franck Baudin 2019-07-05 13:26:30 UTC
(In reply to Artom Lifshitz from comment #2)
> Yeah, we need sosreports with the nova-compute logs at debug level as well
> as the XML of the instance in question.

My lab is unfortunately used for other purposes at the moment (TripleO & Neutron debug), can I reach you to provide you an access next time it's deployed? Or if you have a lab with an SR-IOV PF interfaces, I can replicate the issue on your lab, this is a matter of 10 mins. Thanks!

Comment 5 Artom Lifshitz 2019-07-05 13:31:41 UTC
(In reply to Franck Baudin from comment #4)
> (In reply to Artom Lifshitz from comment #2)
> > Yeah, we need sosreports with the nova-compute logs at debug level as well
> > as the XML of the instance in question.
> 
> My lab is unfortunately used for other purposes at the moment (TripleO &
> Neutron debug), can I reach you to provide you an access next time it's
> deployed?

That works.

> Or if you have a lab with an SR-IOV PF interfaces, I can replicate
> the issue on your lab, this is a matter of 10 mins. Thanks!

I unfortunately do not.

Cheers!

Comment 6 Franck Baudin 2019-07-11 14:27:54 UTC
Created attachment 1589536 [details]
compute node sos report, including nova logs

Comment 9 Artom Lifshitz 2019-07-11 17:51:29 UTC
Looks like this has to do with how Nova calculates PCI addresses for the PFs:

Here's the relevant code: https://github.com/openstack/nova/blob/stable/queens/nova/virt/libvirt/driver.py#L8649-L8654

Here's the relevant log lines:

2019-07-11 17:31:49.117 1 DEBUG nova.virt.libvirt.driver [req-9107df9c-5e7a-492d-a962-1e4cc237735a 587e061a01d74ec6aace10089b7c443c 0dde3f263d0249849aae536e406d1355 - default default] Not ex
posing metadata for not found PCI device 0000:86:00.0 _build_hostdev_metadata /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:8638

2019-07-11 17:31:49.118 1 DEBUG nova.virt.libvirt.driver [req-9107df9c-5e7a-492d-a962-1e4cc237735a 587e061a01d74ec6aace10089b7c443c 0dde3f263d0249849aae536e406d1355 - default default] Not ex
posing metadata for not found PCI device 0000:86:00.1 _build_hostdev_metadata /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:8638

So Nova expects the PFs to be at PCI addresses 0000:86:00.0 and 0000:86:00.1.

However, looking at lspci output on the host, here's what the PFs looks like:

86:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
86:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01

So I suspect the leading 00s difference is why Nova thinks the PFs "don't exist" and therefore doesn't generate metadata for them.

Comment 10 Artom Lifshitz 2019-07-12 00:24:04 UTC
Further analysis makes me think that the problematic code is actually https://github.com/openstack/nova/blob/stable/queens/nova/pci/utils.py#L145

The path that _get_sysfs_netdev_path() returns would be `/sys/bus/pci/devices/0000\:86\:00.1/physfn/net` - however that doesn't exist on the compute host. So os.listdir raises an exception, this gets caught and reraised as PciDeviceNotFoundById, which causes us to skip that interface's metadata.

This is where my knowledge of SRIOV breaks down a bit. Why would /sys/bus/pci/devices/0000\:86\:00.1/physfn/net not exist? Is _get_sysfs_netdev_path() making wrong assumptions? Or is it device-dependant?

Comment 12 Artom Lifshitz 2019-09-04 15:10:00 UTC
I have a patch upstream for this, but I'm currently concentrating on the NUMA live migration RFE, so I'm not actively working on this. Moving back to ASSIGNED.


Note You need to log in before you can comment on or make changes to this bug.