Bug 1839097
| Summary: | [Docs] PCI-Passthrough Failure, PCI-Passthrough does not seem to be segment aware when using routed provider networks | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | broskos | ||||||||||||
| Component: | documentation | Assignee: | Irina <igallagh> | ||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | RHOS Documentation Team <rhos-docs> | ||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||
| Priority: | medium | ||||||||||||||
| Version: | 16.0 (Train) | CC: | amuller, atragler, bcafarel, bdobreli, broskos, chrisw, dasmith, ealcaniz, eglynn, fbaudin, fiezzi, igallagh, jhakimra, kchamart, ralonsoh, sbauza, scohen, sgordon, smooney, vromanso | ||||||||||||
| Target Milestone: | --- | Keywords: | Triaged | ||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | docs-accepted | ||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2021-02-12 11:42:06 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Bug Depends On: | 1732835, 1928217 | ||||||||||||||
| Bug Blocks: | 1878201 | ||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
broskos
2020-05-22 13:27:27 UTC
For each SR-IOV Nic:
Nic port 0 connects to switch 0 (last char in the nic name is 0)
Nic port 1 connects to switch 1 (last char in the nic name is 1)
My typical host setup would look something like this with unique physnet names at each segment:
CENTRAL
ComputeParameters:
KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
IsolCpusList: 2-35,38-71
NovaComputeCpuDedicatedSet: 2-35,38-71
NovaComputeCpuSharedSet: 0,1,36,37
NeutronPhysicalDevMappings:
- sriov1:ens3f0
- sriov2:ens3f1
- sriov1:ens7f0
- sriov2:ens7f1
NeutronSriovNumVFs:
- ens3f0:8
- ens3f1:8
- ens7f0:8
- ens7f1:8
NovaPCIPassthrough:
- devname: "ens3f0"
physical_network: "sriov1"
- devname: "ens3f1"
physical_network: "sriov2"
- devname: "ens7f0"
physical_network: "sriov1"
- devname: "ens7f1"
physical_network: "sriov2"
EDGE1
ComputeEdge1Parameters:
KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
IsolCpusList: 2-35,38-71
NovaComputeCpuDedicatedSet: 2-35,38-71
NovaComputeCpuSharedSet: 0,1,36,37
NovaEnableRbdBackend: false
NeutronPhysicalDevMappings:
- sriov1-edge1:ens3f0
- sriov2-edge1:ens3f1
- sriov1-edge1:ens7f0
- sriov2-edge1:ens7f1
NeutronSriovNumVFs:
- ens3f0:8
- ens3f1:8
- ens7f0:8
- ens7f1:8
NovaPCIPassthrough:
- devname: "ens3f0"
physical_network: "sriov1-edge1"
- devname: "ens3f1"
physical_network: "sriov2-edge1"
- devname: "ens7f0"
physical_network: "sriov1-edge1"
- devname: "ens7f1"
physical_network: "sriov2-edge1"
EDGE2
ComputeEdge2Parameters:
KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
IsolCpusList: 2-35,38-71
NovaComputeCpuDedicatedSet: 2-35,38-71
NovaComputeCpuSharedSet: 0,1,36,37
NovaEnableRbdBackend: false
NeutronPhysicalDevMappings:
- sriov1-edge2:ens3f0
- sriov2-edge2:ens3f1
- sriov1-edge2:ens7f0
- sriov2-edge2:ens7f1
NeutronSriovNumVFs:
- ens3f0:8
- ens3f1:8
- ens7f0:8
- ens7f1:8
NovaPCIPassthrough:
- devname: "ens3f0"
physical_network: "sriov1-edge2"
- devname: "ens3f1"
physical_network: "sriov2-edge2"
- devname: "ens7f0"
physical_network: "sriov1-edge2"
- devname: "ens7f1"
physical_network: "sriov2-edge2"
NETWORKS
Now I create 2 networks, midhaul1 targets port0 on each card, midhaul2 targets port1 on each card. Nova and the scheduler pick the NUMA and assign the correct card, then I can create a virtual port on a VF from each physical port.
openstack network create --provider-physical-network sriov1 --provider-network-type vlan --provider-segment 205 midhaul1-net
uuid=$(openstack network segment list --network midhaul1-net -f value -c ID)
openstack network segment set --name midhaul1-central $uuid
openstack subnet create --network midhaul1-net --no-dhcp --network-segment midhaul1-central --subnet-range 192.168.205.0/26 --gateway 192.168.205.62 midhaul1-subnet
openstack network segment create --network midhaul1-net --physical-network sriov1-edge1 --network-type vlan --segment 1205 midhaul1-edge1
openstack subnet create --network midhaul1-net --no-dhcp --network-segment midhaul1-edge1 --subnet-range 192.168.205.64/26 --gateway 192.168.205.126 midhaul1-edge1-subnet
openstack network segment create --network midhaul1-net --physical-network sriov1-edge2 --network-type vlan --segment 2205 midhaul1-edge2
openstack subnet create --network midhaul1-net --no-dhcp --network-segment midhaul1-edge2 --subnet-range 192.168.205.128/26 --gateway 192.168.205.190 midhaul1-edge2-subnet
openstack network create --provider-physical-network sriov2 --provider-network-type vlan --provider-segment 205 midhaul2-net
uuid=$(openstack network segment list --network midhaul2-net -f value -c ID)
openstack network segment set --name midhaul2-central $uuid
openstack subnet create --network midhaul2-net --no-dhcp --network-segment midhaul2-central --subnet-range 192.168.205.0/26 --gateway 192.168.205.62 midhaul2-subnet
openstack network segment create --network midhaul2-net --physical-network sriov1-edge2 --network-type vlan --segment 1205 midhaul2-edge1
openstack subnet create --network midhaul2-net --no-dhcp --network-segment midhaul2-edge1 --subnet-range 192.168.205.64/26 --gateway 192.168.205.126 midhaul2-edge1-subnet
openstack network segment create --network midhaul2-net --physical-network sriov2-edge2 --network-type vlan --segment 2205 midhaul2-edge2
openstack subnet create --network midhaul2-net --no-dhcp --network-segment midhaul2-edge2 --subnet-range 192.168.205.128/26 --gateway 192.168.205.190 midhaul2-edge2-subnet
PROBLEM
The above configuration results in PCI-Passthrough failures at the nova scheduler for instances that target edge1 and edge2, instances targeted to central create as expected.
OBSERVATION
It seems to me that pci-passthrough is not aligned with the segments; it only seems to use the name of the physnet in segment 0 to request a passthrough device.
POTENTIAL WORKAROUND
Change the configuration to keep the physnet names the same, but modify the pci-passthrough names to match the segment 0 names (CENTRAL)
CENTRAL
ComputeParameters:
KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
IsolCpusList: 2-35,38-71
NovaComputeCpuDedicatedSet: 2-35,38-71
NovaComputeCpuSharedSet: 0,1,36,37
NeutronPhysicalDevMappings:
- sriov1:ens3f0
- sriov2:ens3f1
- sriov1:ens7f0
- sriov2:ens7f1
NeutronSriovNumVFs:
- ens3f0:8
- ens3f1:8
- ens7f0:8
- ens7f1:8
NovaPCIPassthrough:
- devname: "ens3f0"
physical_network: "sriov1"
- devname: "ens3f1"
physical_network: "sriov2"
- devname: "ens7f0"
physical_network: "sriov1"
- devname: "ens7f1"
physical_network: "sriov2"
EDGE1
ComputeEdge1Parameters:
KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
IsolCpusList: 2-35,38-71
NovaComputeCpuDedicatedSet: 2-35,38-71
NovaComputeCpuSharedSet: 0,1,36,37
NovaEnableRbdBackend: false
NeutronPhysicalDevMappings:
- sriov1-edge1:ens3f0
- sriov2-edge1:ens3f1
- sriov1-edge1:ens7f0
- sriov2-edge1:ens7f1
NeutronSriovNumVFs:
- ens3f0:8
- ens3f1:8
- ens7f0:8
- ens7f1:8
NovaPCIPassthrough:
- devname: "ens3f0"
physical_network: "sriov1"
- devname: "ens3f1"
physical_network: "sriov2"
- devname: "ens7f0"
physical_network: "sriov1"
- devname: "ens7f1"
physical_network: "sriov2"
EDGE2
ComputeEdge2Parameters:
KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=128 intel_iommu=on iommu=pt"
IsolCpusList: 2-35,38-71
NovaComputeCpuDedicatedSet: 2-35,38-71
NovaComputeCpuSharedSet: 0,1,36,37
NovaEnableRbdBackend: false
NeutronPhysicalDevMappings:
- sriov1-edge2:ens3f0
- sriov2-edge2:ens3f1
- sriov1-edge2:ens7f0
- sriov2-edge2:ens7f1
NeutronSriovNumVFs:
- ens3f0:8
- ens3f1:8
- ens7f0:8
- ens7f1:8
NovaPCIPassthrough:
- devname: "ens3f0"
physical_network: "sriov1"
- devname: "ens3f1"
physical_network: "sriov2"
- devname: "ens7f0"
physical_network: "sriov1"
- devname: "ens7f1"
physical_network: "sriov2"
WORKAROUND OBSERVATION
Things seem to work as expected in this configuration, with instances passing all filters and getting direct VF attachments to the proper networks.
Is this expected? I would think that the nova pci-passthrough functions associated with SR-IOV would need to be segment aware.
For the purpose of this bug, let's focus on one routed provider network with 2 segments:
network:
backhaul1-net 9992a643-c868-4937-a568-608cb62c2d03
segment:
backhaul1-central deb28058-aff8-45a7-9d84-c2cd15e0af5b
subnet:
backhaul1-subnet f3e2d171-e593-4a19-8f00-4454cd9de0bf
segment:
backhaul1-edge2 3851f693-d686-4116-bace-995cc0b70601
subnet:
backhaul1-edge2-subnet ff053250-9eb7-46b5-899d-3a9c74a526c1
now create a 1 port on each subnet and attach to VM during server create:
# CENTRAL
PORT=$(openstack port create --network backhaul1-net --vnic-type direct -f value -c id test-backhaul1-central)
echo $PORT
fd70d8c4-61bd-4ed8-b838-190e9c1cdf2b
openstack server create --flavor m1.small-dedicated \
--image rhel-81 \
--port $PORT \
--config-drive True \
--availability-zone central \
--key-name undercloud-key \
--user-data ~/admin-user-data.txt \
test-central-backhaul1
# EDGE2
PORT=$(openstack port create --network backhaul1-net --vnic-type direct -f value -c id test-backhaul1-edge2)
echo $PORT
3b374ed0-a608-4f49-a303-f4a915d663e5
openstack server create --flavor m1.small-dedicated \
--image rhel-81 \
--port $PORT \
--config-drive True \
--availability-zone edge2 \
--key-name undercloud-key \
--user-data ~/admin-user-data.txt \
test-edge2-backhaul1
# central succeeds, edge2 fails:
openstack server list |grep test
| be6df878-464f-4304-9687-82aaf86ff70d | test-edge2-backhaul1 | ERROR | | rhel-81 | |
| d458a7dc-63fd-49ee-807e-1ce0734e24bb | test-central-backhaul1 | ACTIVE | backhaul1-net=192.168.202.53 | rhel-81 | |
# Lets grep through all the logs on the controllers and find out why it failed.
ansible Controller -i /usr/bin/tripleo-ansible-inventory -b -m shell -a 'grep -r be6df878-464f-4304-9687-82aaf86ff70d /var/log/containers'| sed 's/\\n/\n/g'
# Here is the filtered result
/var/log/containers/nova/nova-scheduler.log:2020-05-29 11:08:17.158 24 INFO nova.filters [req-21201d26-7012-4745-9184-387a92e43357 6675c887a80e4df38bf9caf34195c93d a564931e992b45dca0fbc60df3586c22 - default default] Filtering removed all hosts for the request with instance ID 'be6df878-464f-4304-9687-82aaf86ff70d'. Filter results: ['RetryFilter: (start: 10, end: 10)', 'AvailabilityZoneFilter: (start: 10, end: 1)', 'ComputeFilter: (start: 1, end: 1)', 'AggregateInstanceExtraSpecsFilter: (start: 1, end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)', 'ImagePropertiesFilter: (start: 1, end: 1)', 'ServerGroupAntiAffinityFilter: (start: 1, end: 1)', 'ServerGroupAffinityFilter: (start: 1, end: 1)', 'PciPassthroughFilter: (start: 1, end: 0)']
# I'll work on collecting the logs and attach them
Created attachment 1693330 [details]
edge2-compute-0.tgz
Created attachment 1693331 [details]
central-controller-2.tgz
Created attachment 1693332 [details]
central-controller-1.tgz
Created attachment 1693333 [details]
central-controller-0.tgz
Logs attached, 1 gz for each controller, + 1 gz for edge node where instance should have launched.
Created attachment 1693470 [details]
pci_devices.txt
Here is the output from the nova db, pci_devices table
Versions: 13 (routed networks may no be a thing here, but still need to document unsupported extension), 15, 16 SME: Sean Note added to "Configuring PCI passthrough" section in the "Configuring the Compute (nova) service for instance creation" guide - available on the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/configuring_the_compute_nova_service_for_instance_creation/configuring-pci-passthrough |