Bug 1839097 - [Docs] PCI-Passthrough Failure, PCI-Passthrough does not seem to be segment aware when using routed provider networks
Summary: [Docs] PCI-Passthrough Failure, PCI-Passthrough does not seem to be segment a...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ---
: ---
Assignee: Irina
QA Contact: RHOS Documentation Team
URL:
Whiteboard: docs-accepted
Depends On: 1732835 1928217
Blocks: 1878201
TreeView+ depends on / blocked
 
Reported: 2020-05-22 13:27 UTC by broskos
Modified: 2021-02-12 16:28 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-12 11:42:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
edge2-compute-0.tgz (291.43 KB, application/gzip)
2020-05-29 11:30 UTC, broskos
no flags Details
central-controller-2.tgz (3.89 MB, application/gzip)
2020-05-29 11:31 UTC, broskos
no flags Details
central-controller-1.tgz (3.50 MB, application/gzip)
2020-05-29 11:32 UTC, broskos
no flags Details
central-controller-0.tgz (3.53 MB, application/gzip)
2020-05-29 11:33 UTC, broskos
no flags Details
pci_devices.txt (291.35 KB, text/plain)
2020-05-29 16:55 UTC, broskos
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 733703 0 None MERGED Routed network scheduling spec 2021-02-17 14:13:59 UTC

Description broskos 2020-05-22 13:27:27 UTC
Description of problem:
Instances are failing PCI-Passthrough filter when using SR-IOV with Routed provider networks.  This is a simulated DCN environment, but all nodes are in the same datacenter, so this also applies to Spine-leaf deploymnet

Version-Release number of selected component (if applicable):
16.0

How reproducible:
100%

Steps to Reproduce:
1.  define roles with bridge mappings appropriate to spine leaf for use with routed provider networks.  Follow the traditional naming convention for configuring PCI-Passthrough for SR-IOV on the role.

Deploy a VM to a segment other than the first/default segment.

2.
3.

Actual results:

Scheduler will fail for PCI Passthrough

Expected results:

Scheduler will succeed and VM will be configured with SR-IOV direct interfaces on proper segment with proper pci-passthrough

Additional info:

Comment 1 broskos 2020-05-22 13:28:20 UTC
For each SR-IOV Nic:
Nic port 0 connects to switch 0 (last char in the nic name is 0)
Nic port 1 connects to switch 1 (last char in the nic name is 1)

My typical host setup would look something like this with unique physnet names at each segment:

CENTRAL
  ComputeParameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
    IsolCpusList: 2-35,38-71
    NovaComputeCpuDedicatedSet: 2-35,38-71
    NovaComputeCpuSharedSet: 0,1,36,37
    NeutronPhysicalDevMappings:
      - sriov1:ens3f0
      - sriov2:ens3f1
      - sriov1:ens7f0
      - sriov2:ens7f1
    NeutronSriovNumVFs:
      - ens3f0:8
      - ens3f1:8
      - ens7f0:8
      - ens7f1:8
    NovaPCIPassthrough:
      - devname: "ens3f0"
        physical_network: "sriov1"
      - devname: "ens3f1"
        physical_network: "sriov2"
      - devname: "ens7f0"
        physical_network: "sriov1"
      - devname: "ens7f1"
        physical_network: "sriov2"


EDGE1
  ComputeEdge1Parameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
    IsolCpusList: 2-35,38-71
    NovaComputeCpuDedicatedSet: 2-35,38-71
    NovaComputeCpuSharedSet: 0,1,36,37
    NovaEnableRbdBackend: false
    NeutronPhysicalDevMappings:
      - sriov1-edge1:ens3f0
      - sriov2-edge1:ens3f1
      - sriov1-edge1:ens7f0
      - sriov2-edge1:ens7f1
    NeutronSriovNumVFs:
      - ens3f0:8
      - ens3f1:8
      - ens7f0:8
      - ens7f1:8
    NovaPCIPassthrough:
      - devname: "ens3f0"
        physical_network: "sriov1-edge1"
      - devname: "ens3f1"
        physical_network: "sriov2-edge1"
      - devname: "ens7f0"
        physical_network: "sriov1-edge1"
      - devname: "ens7f1"
        physical_network: "sriov2-edge1"

EDGE2
  ComputeEdge2Parameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
    IsolCpusList: 2-35,38-71
    NovaComputeCpuDedicatedSet: 2-35,38-71
    NovaComputeCpuSharedSet: 0,1,36,37
    NovaEnableRbdBackend: false
    NeutronPhysicalDevMappings:
      - sriov1-edge2:ens3f0
      - sriov2-edge2:ens3f1
      - sriov1-edge2:ens7f0
      - sriov2-edge2:ens7f1
    NeutronSriovNumVFs:
      - ens3f0:8
      - ens3f1:8
      - ens7f0:8
      - ens7f1:8
    NovaPCIPassthrough:
      - devname: "ens3f0"
        physical_network: "sriov1-edge2"
      - devname: "ens3f1"
        physical_network: "sriov2-edge2"
      - devname: "ens7f0"
        physical_network: "sriov1-edge2"
      - devname: "ens7f1"
        physical_network: "sriov2-edge2"

NETWORKS
Now I create 2 networks, midhaul1 targets port0 on each card, midhaul2 targets port1 on each card.  Nova and the scheduler pick the NUMA and assign the correct card, then I can create a virtual port on a VF from each physical port.

openstack network create --provider-physical-network sriov1 --provider-network-type vlan --provider-segment 205 midhaul1-net
uuid=$(openstack network segment list --network midhaul1-net -f value -c ID)
openstack network segment set --name midhaul1-central $uuid
openstack subnet create --network midhaul1-net --no-dhcp --network-segment midhaul1-central --subnet-range 192.168.205.0/26 --gateway 192.168.205.62 midhaul1-subnet
openstack network segment create --network midhaul1-net --physical-network sriov1-edge1 --network-type vlan --segment 1205  midhaul1-edge1
openstack subnet create --network midhaul1-net --no-dhcp --network-segment midhaul1-edge1 --subnet-range 192.168.205.64/26 --gateway 192.168.205.126 midhaul1-edge1-subnet
openstack network segment create --network midhaul1-net --physical-network sriov1-edge2 --network-type vlan --segment 2205 midhaul1-edge2
openstack subnet create --network midhaul1-net --no-dhcp --network-segment midhaul1-edge2 --subnet-range 192.168.205.128/26 --gateway 192.168.205.190 midhaul1-edge2-subnet
openstack network create --provider-physical-network sriov2 --provider-network-type vlan --provider-segment 205 midhaul2-net
uuid=$(openstack network segment list --network midhaul2-net -f value -c ID)
openstack network segment set --name midhaul2-central $uuid
openstack subnet create --network midhaul2-net --no-dhcp --network-segment midhaul2-central --subnet-range 192.168.205.0/26 --gateway 192.168.205.62 midhaul2-subnet
openstack network segment create --network midhaul2-net --physical-network sriov1-edge2 --network-type vlan --segment 1205  midhaul2-edge1
openstack subnet create --network midhaul2-net --no-dhcp --network-segment midhaul2-edge1 --subnet-range 192.168.205.64/26 --gateway 192.168.205.126 midhaul2-edge1-subnet
openstack network segment create --network midhaul2-net --physical-network sriov2-edge2 --network-type vlan --segment 2205 midhaul2-edge2
openstack subnet create --network midhaul2-net --no-dhcp --network-segment midhaul2-edge2 --subnet-range 192.168.205.128/26 --gateway 192.168.205.190 midhaul2-edge2-subnet

PROBLEM
The above configuration results in PCI-Passthrough failures at the nova scheduler for instances that target edge1 and edge2, instances targeted to central create as expected.

OBSERVATION
It seems to me that pci-passthrough is not aligned with the segments;  it only seems to use the name of the physnet in segment 0 to request a passthrough device.

POTENTIAL WORKAROUND
Change the configuration to keep the physnet names the same, but modify the pci-passthrough names to match the segment 0 names (CENTRAL)


CENTRAL
  ComputeParameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
    IsolCpusList: 2-35,38-71
    NovaComputeCpuDedicatedSet: 2-35,38-71
    NovaComputeCpuSharedSet: 0,1,36,37
    NeutronPhysicalDevMappings:
      - sriov1:ens3f0
      - sriov2:ens3f1
      - sriov1:ens7f0
      - sriov2:ens7f1
    NeutronSriovNumVFs:
      - ens3f0:8
      - ens3f1:8
      - ens7f0:8
      - ens7f1:8
    NovaPCIPassthrough:
      - devname: "ens3f0"
        physical_network: "sriov1"
      - devname: "ens3f1"
        physical_network: "sriov2"
      - devname: "ens7f0"
        physical_network: "sriov1"
      - devname: "ens7f1"
        physical_network: "sriov2"


EDGE1
  ComputeEdge1Parameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
    IsolCpusList: 2-35,38-71
    NovaComputeCpuDedicatedSet: 2-35,38-71
    NovaComputeCpuSharedSet: 0,1,36,37
    NovaEnableRbdBackend: false
    NeutronPhysicalDevMappings:
      - sriov1-edge1:ens3f0
      - sriov2-edge1:ens3f1
      - sriov1-edge1:ens7f0
      - sriov2-edge1:ens7f1
    NeutronSriovNumVFs:
      - ens3f0:8
      - ens3f1:8
      - ens7f0:8
      - ens7f1:8
    NovaPCIPassthrough:
      - devname: "ens3f0"
        physical_network: "sriov1"
      - devname: "ens3f1"
        physical_network: "sriov2"
      - devname: "ens7f0"
        physical_network: "sriov1"
      - devname: "ens7f1"
        physical_network: "sriov2"

EDGE2
  ComputeEdge2Parameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
    IsolCpusList: 2-35,38-71
    NovaComputeCpuDedicatedSet: 2-35,38-71
    NovaComputeCpuSharedSet: 0,1,36,37
    NovaEnableRbdBackend: false
    NeutronPhysicalDevMappings:
      - sriov1-edge2:ens3f0
      - sriov2-edge2:ens3f1
      - sriov1-edge2:ens7f0
      - sriov2-edge2:ens7f1
    NeutronSriovNumVFs:
      - ens3f0:8
      - ens3f1:8
      - ens7f0:8
      - ens7f1:8
    NovaPCIPassthrough:
      - devname: "ens3f0"
        physical_network: "sriov1"
      - devname: "ens3f1"
        physical_network: "sriov2"
      - devname: "ens7f0"
        physical_network: "sriov1"
      - devname: "ens7f1"
        physical_network: "sriov2"

WORKAROUND OBSERVATION	
Things seem to work as expected in this configuration, with instances passing all filters and getting direct VF attachments to the proper networks.  

Is this expected?  I would think that the nova pci-passthrough functions associated with SR-IOV would need to be segment aware.

Comment 4 broskos 2020-05-29 11:13:57 UTC
For the purpose of this bug, let's focus on one routed provider network with 2 segments:

network: 
backhaul1-net 9992a643-c868-4937-a568-608cb62c2d03
  segment:
  backhaul1-central deb28058-aff8-45a7-9d84-c2cd15e0af5b
    subnet: 
    backhaul1-subnet f3e2d171-e593-4a19-8f00-4454cd9de0bf
  segment:
  backhaul1-edge2 3851f693-d686-4116-bace-995cc0b70601
    subnet:
    backhaul1-edge2-subnet ff053250-9eb7-46b5-899d-3a9c74a526c1


now create a 1 port on each subnet and attach to VM during server create:
# CENTRAL
PORT=$(openstack port create --network backhaul1-net --vnic-type direct -f value -c id test-backhaul1-central)

echo $PORT
fd70d8c4-61bd-4ed8-b838-190e9c1cdf2b

openstack server create --flavor m1.small-dedicated \
--image rhel-81 \
--port $PORT \
--config-drive True \
--availability-zone central \
--key-name undercloud-key \
--user-data ~/admin-user-data.txt \
test-central-backhaul1

# EDGE2
PORT=$(openstack port create --network backhaul1-net --vnic-type direct -f value -c id test-backhaul1-edge2)

 echo $PORT
3b374ed0-a608-4f49-a303-f4a915d663e5

openstack server create --flavor m1.small-dedicated \
--image rhel-81 \
--port $PORT \
--config-drive True \
--availability-zone edge2 \
--key-name undercloud-key \
--user-data ~/admin-user-data.txt \
test-edge2-backhaul1

# central succeeds, edge2 fails:
openstack server list |grep test
| be6df878-464f-4304-9687-82aaf86ff70d | test-edge2-backhaul1   | ERROR  |                                                                                                                                                                                                                                                | rhel-81                  |        |
| d458a7dc-63fd-49ee-807e-1ce0734e24bb | test-central-backhaul1 | ACTIVE | backhaul1-net=192.168.202.53                                                                                                                                                                                                                   | rhel-81                  |        |

# Lets grep through all the logs on the controllers and find out why it failed.

ansible Controller -i /usr/bin/tripleo-ansible-inventory -b -m shell -a 'grep -r be6df878-464f-4304-9687-82aaf86ff70d /var/log/containers'| sed 's/\\n/\n/g'

# Here is the filtered result
/var/log/containers/nova/nova-scheduler.log:2020-05-29 11:08:17.158 24 INFO nova.filters [req-21201d26-7012-4745-9184-387a92e43357 6675c887a80e4df38bf9caf34195c93d a564931e992b45dca0fbc60df3586c22 - default default] Filtering removed all hosts for the request with instance ID 'be6df878-464f-4304-9687-82aaf86ff70d'. Filter results: ['RetryFilter: (start: 10, end: 10)', 'AvailabilityZoneFilter: (start: 10, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'AggregateInstanceExtraSpecsFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 'ServerGroupAffinityFilter: (start: 1, end: 1)', 'PciPassthroughFilter: (start: 1, end: 0)']

# I'll work on collecting the logs and attach them

Comment 5 broskos 2020-05-29 11:30:11 UTC
Created attachment 1693330 [details]
edge2-compute-0.tgz

Comment 6 broskos 2020-05-29 11:31:36 UTC
Created attachment 1693331 [details]
central-controller-2.tgz

Comment 7 broskos 2020-05-29 11:32:32 UTC
Created attachment 1693332 [details]
central-controller-1.tgz

Comment 8 broskos 2020-05-29 11:33:35 UTC
Created attachment 1693333 [details]
central-controller-0.tgz

Logs attached, 1 gz for each controller, + 1 gz for edge node where instance should have launched.

Comment 10 broskos 2020-05-29 16:55:03 UTC
Created attachment 1693470 [details]
pci_devices.txt

Here is the output from the nova db, pci_devices table

Comment 18 Artom Lifshitz 2020-06-19 14:04:45 UTC
Versions: 13 (routed networks may no be a thing here, but still need to document unsupported extension), 15, 16
SME: Sean

Comment 25 Irina 2021-02-12 11:42:06 UTC
Note added to "Configuring PCI passthrough" section in the "Configuring the Compute (nova) service for instance creation" guide - available on the Customer Portal:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/configuring_the_compute_nova_service_for_instance_creation/configuring-pci-passthrough


Note You need to log in before you can comment on or make changes to this bug.