Description of problem: Instances are failing PCI-Passthrough filter when using SR-IOV with Routed provider networks. This is a simulated DCN environment, but all nodes are in the same datacenter, so this also applies to Spine-leaf deploymnet Version-Release number of selected component (if applicable): 16.0 How reproducible: 100% Steps to Reproduce: 1. define roles with bridge mappings appropriate to spine leaf for use with routed provider networks. Follow the traditional naming convention for configuring PCI-Passthrough for SR-IOV on the role. Deploy a VM to a segment other than the first/default segment. 2. 3. Actual results: Scheduler will fail for PCI Passthrough Expected results: Scheduler will succeed and VM will be configured with SR-IOV direct interfaces on proper segment with proper pci-passthrough Additional info:
For each SR-IOV Nic: Nic port 0 connects to switch 0 (last char in the nic name is 0) Nic port 1 connects to switch 1 (last char in the nic name is 1) My typical host setup would look something like this with unique physnet names at each segment: CENTRAL ComputeParameters: KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt" IsolCpusList: 2-35,38-71 NovaComputeCpuDedicatedSet: 2-35,38-71 NovaComputeCpuSharedSet: 0,1,36,37 NeutronPhysicalDevMappings: - sriov1:ens3f0 - sriov2:ens3f1 - sriov1:ens7f0 - sriov2:ens7f1 NeutronSriovNumVFs: - ens3f0:8 - ens3f1:8 - ens7f0:8 - ens7f1:8 NovaPCIPassthrough: - devname: "ens3f0" physical_network: "sriov1" - devname: "ens3f1" physical_network: "sriov2" - devname: "ens7f0" physical_network: "sriov1" - devname: "ens7f1" physical_network: "sriov2" EDGE1 ComputeEdge1Parameters: KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt" IsolCpusList: 2-35,38-71 NovaComputeCpuDedicatedSet: 2-35,38-71 NovaComputeCpuSharedSet: 0,1,36,37 NovaEnableRbdBackend: false NeutronPhysicalDevMappings: - sriov1-edge1:ens3f0 - sriov2-edge1:ens3f1 - sriov1-edge1:ens7f0 - sriov2-edge1:ens7f1 NeutronSriovNumVFs: - ens3f0:8 - ens3f1:8 - ens7f0:8 - ens7f1:8 NovaPCIPassthrough: - devname: "ens3f0" physical_network: "sriov1-edge1" - devname: "ens3f1" physical_network: "sriov2-edge1" - devname: "ens7f0" physical_network: "sriov1-edge1" - devname: "ens7f1" physical_network: "sriov2-edge1" EDGE2 ComputeEdge2Parameters: KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt" IsolCpusList: 2-35,38-71 NovaComputeCpuDedicatedSet: 2-35,38-71 NovaComputeCpuSharedSet: 0,1,36,37 NovaEnableRbdBackend: false NeutronPhysicalDevMappings: - sriov1-edge2:ens3f0 - sriov2-edge2:ens3f1 - sriov1-edge2:ens7f0 - sriov2-edge2:ens7f1 NeutronSriovNumVFs: - ens3f0:8 - ens3f1:8 - ens7f0:8 - ens7f1:8 NovaPCIPassthrough: - devname: "ens3f0" physical_network: "sriov1-edge2" - devname: "ens3f1" physical_network: "sriov2-edge2" - devname: "ens7f0" physical_network: "sriov1-edge2" - devname: "ens7f1" physical_network: "sriov2-edge2" NETWORKS Now I create 2 networks, midhaul1 targets port0 on each card, midhaul2 targets port1 on each card. Nova and the scheduler pick the NUMA and assign the correct card, then I can create a virtual port on a VF from each physical port. openstack network create --provider-physical-network sriov1 --provider-network-type vlan --provider-segment 205 midhaul1-net uuid=$(openstack network segment list --network midhaul1-net -f value -c ID) openstack network segment set --name midhaul1-central $uuid openstack subnet create --network midhaul1-net --no-dhcp --network-segment midhaul1-central --subnet-range 192.168.205.0/26 --gateway 192.168.205.62 midhaul1-subnet openstack network segment create --network midhaul1-net --physical-network sriov1-edge1 --network-type vlan --segment 1205 midhaul1-edge1 openstack subnet create --network midhaul1-net --no-dhcp --network-segment midhaul1-edge1 --subnet-range 192.168.205.64/26 --gateway 192.168.205.126 midhaul1-edge1-subnet openstack network segment create --network midhaul1-net --physical-network sriov1-edge2 --network-type vlan --segment 2205 midhaul1-edge2 openstack subnet create --network midhaul1-net --no-dhcp --network-segment midhaul1-edge2 --subnet-range 192.168.205.128/26 --gateway 192.168.205.190 midhaul1-edge2-subnet openstack network create --provider-physical-network sriov2 --provider-network-type vlan --provider-segment 205 midhaul2-net uuid=$(openstack network segment list --network midhaul2-net -f value -c ID) openstack network segment set --name midhaul2-central $uuid openstack subnet create --network midhaul2-net --no-dhcp --network-segment midhaul2-central --subnet-range 192.168.205.0/26 --gateway 192.168.205.62 midhaul2-subnet openstack network segment create --network midhaul2-net --physical-network sriov1-edge2 --network-type vlan --segment 1205 midhaul2-edge1 openstack subnet create --network midhaul2-net --no-dhcp --network-segment midhaul2-edge1 --subnet-range 192.168.205.64/26 --gateway 192.168.205.126 midhaul2-edge1-subnet openstack network segment create --network midhaul2-net --physical-network sriov2-edge2 --network-type vlan --segment 2205 midhaul2-edge2 openstack subnet create --network midhaul2-net --no-dhcp --network-segment midhaul2-edge2 --subnet-range 192.168.205.128/26 --gateway 192.168.205.190 midhaul2-edge2-subnet PROBLEM The above configuration results in PCI-Passthrough failures at the nova scheduler for instances that target edge1 and edge2, instances targeted to central create as expected. OBSERVATION It seems to me that pci-passthrough is not aligned with the segments; it only seems to use the name of the physnet in segment 0 to request a passthrough device. POTENTIAL WORKAROUND Change the configuration to keep the physnet names the same, but modify the pci-passthrough names to match the segment 0 names (CENTRAL) CENTRAL ComputeParameters: KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt" IsolCpusList: 2-35,38-71 NovaComputeCpuDedicatedSet: 2-35,38-71 NovaComputeCpuSharedSet: 0,1,36,37 NeutronPhysicalDevMappings: - sriov1:ens3f0 - sriov2:ens3f1 - sriov1:ens7f0 - sriov2:ens7f1 NeutronSriovNumVFs: - ens3f0:8 - ens3f1:8 - ens7f0:8 - ens7f1:8 NovaPCIPassthrough: - devname: "ens3f0" physical_network: "sriov1" - devname: "ens3f1" physical_network: "sriov2" - devname: "ens7f0" physical_network: "sriov1" - devname: "ens7f1" physical_network: "sriov2" EDGE1 ComputeEdge1Parameters: KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt" IsolCpusList: 2-35,38-71 NovaComputeCpuDedicatedSet: 2-35,38-71 NovaComputeCpuSharedSet: 0,1,36,37 NovaEnableRbdBackend: false NeutronPhysicalDevMappings: - sriov1-edge1:ens3f0 - sriov2-edge1:ens3f1 - sriov1-edge1:ens7f0 - sriov2-edge1:ens7f1 NeutronSriovNumVFs: - ens3f0:8 - ens3f1:8 - ens7f0:8 - ens7f1:8 NovaPCIPassthrough: - devname: "ens3f0" physical_network: "sriov1" - devname: "ens3f1" physical_network: "sriov2" - devname: "ens7f0" physical_network: "sriov1" - devname: "ens7f1" physical_network: "sriov2" EDGE2 ComputeEdge2Parameters: KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt" IsolCpusList: 2-35,38-71 NovaComputeCpuDedicatedSet: 2-35,38-71 NovaComputeCpuSharedSet: 0,1,36,37 NovaEnableRbdBackend: false NeutronPhysicalDevMappings: - sriov1-edge2:ens3f0 - sriov2-edge2:ens3f1 - sriov1-edge2:ens7f0 - sriov2-edge2:ens7f1 NeutronSriovNumVFs: - ens3f0:8 - ens3f1:8 - ens7f0:8 - ens7f1:8 NovaPCIPassthrough: - devname: "ens3f0" physical_network: "sriov1" - devname: "ens3f1" physical_network: "sriov2" - devname: "ens7f0" physical_network: "sriov1" - devname: "ens7f1" physical_network: "sriov2" WORKAROUND OBSERVATION Things seem to work as expected in this configuration, with instances passing all filters and getting direct VF attachments to the proper networks. Is this expected? I would think that the nova pci-passthrough functions associated with SR-IOV would need to be segment aware.
For the purpose of this bug, let's focus on one routed provider network with 2 segments: network: backhaul1-net 9992a643-c868-4937-a568-608cb62c2d03 segment: backhaul1-central deb28058-aff8-45a7-9d84-c2cd15e0af5b subnet: backhaul1-subnet f3e2d171-e593-4a19-8f00-4454cd9de0bf segment: backhaul1-edge2 3851f693-d686-4116-bace-995cc0b70601 subnet: backhaul1-edge2-subnet ff053250-9eb7-46b5-899d-3a9c74a526c1 now create a 1 port on each subnet and attach to VM during server create: # CENTRAL PORT=$(openstack port create --network backhaul1-net --vnic-type direct -f value -c id test-backhaul1-central) echo $PORT fd70d8c4-61bd-4ed8-b838-190e9c1cdf2b openstack server create --flavor m1.small-dedicated \ --image rhel-81 \ --port $PORT \ --config-drive True \ --availability-zone central \ --key-name undercloud-key \ --user-data ~/admin-user-data.txt \ test-central-backhaul1 # EDGE2 PORT=$(openstack port create --network backhaul1-net --vnic-type direct -f value -c id test-backhaul1-edge2) echo $PORT 3b374ed0-a608-4f49-a303-f4a915d663e5 openstack server create --flavor m1.small-dedicated \ --image rhel-81 \ --port $PORT \ --config-drive True \ --availability-zone edge2 \ --key-name undercloud-key \ --user-data ~/admin-user-data.txt \ test-edge2-backhaul1 # central succeeds, edge2 fails: openstack server list |grep test | be6df878-464f-4304-9687-82aaf86ff70d | test-edge2-backhaul1 | ERROR | | rhel-81 | | | d458a7dc-63fd-49ee-807e-1ce0734e24bb | test-central-backhaul1 | ACTIVE | backhaul1-net=192.168.202.53 | rhel-81 | | # Lets grep through all the logs on the controllers and find out why it failed. ansible Controller -i /usr/bin/tripleo-ansible-inventory -b -m shell -a 'grep -r be6df878-464f-4304-9687-82aaf86ff70d /var/log/containers'| sed 's/\\n/\n/g' # Here is the filtered result /var/log/containers/nova/nova-scheduler.log:2020-05-29 11:08:17.158 24 INFO nova.filters [req-21201d26-7012-4745-9184-387a92e43357 6675c887a80e4df38bf9caf34195c93d a564931e992b45dca0fbc60df3586c22 - default default] Filtering removed all hosts for the request with instance ID 'be6df878-464f-4304-9687-82aaf86ff70d'. Filter results: ['RetryFilter: (start: 10, end: 10)', 'AvailabilityZoneFilter: (start: 10, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'AggregateInstanceExtraSpecsFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 'ServerGroupAffinityFilter: (start: 1, end: 1)', 'PciPassthroughFilter: (start: 1, end: 0)'] # I'll work on collecting the logs and attach them
Created attachment 1693330 [details] edge2-compute-0.tgz
Created attachment 1693331 [details] central-controller-2.tgz
Created attachment 1693332 [details] central-controller-1.tgz
Created attachment 1693333 [details] central-controller-0.tgz Logs attached, 1 gz for each controller, + 1 gz for edge node where instance should have launched.
Created attachment 1693470 [details] pci_devices.txt Here is the output from the nova db, pci_devices table
Versions: 13 (routed networks may no be a thing here, but still need to document unsupported extension), 15, 16 SME: Sean
Note added to "Configuring PCI passthrough" section in the "Configuring the Compute (nova) service for instance creation" guide - available on the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/configuring_the_compute_nova_service_for_instance_creation/configuring-pci-passthrough