Bug 1852110
Summary: | nova host-evacuation returns erroneous pci addresses and an error: Unable to correlate PCI slot | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | David Vallee Delisle <dvd> |
Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
Status: | CLOSED ERRATA | QA Contact: | James Parker <jparker> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 13.0 (Queens) | CC: | dasmith, ebarrera, eglynn, jhakimra, jparker, jschluet, jzaher, kchamart, kurathod, lyarwood, mircea.vutcovici, nnavarat, nova-maint, sbauza, sgordon, smooney, vromanso |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-nova-23.2.1-0.20220317231948.9609ae0.el8ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1767797 | Environment: | |
Last Closed: | 2022-09-21 12:10:55 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2084239 |
Description
David Vallee Delisle
2020-06-29 18:37:18 UTC
I'm not sure how it was before the host evacuation but we now have multiple ports using the same pci_addresses [1][2] They manually updated the binding_profile of one of the instance with free pci_addresses, the instance launched and everything was working, except for the fact that the ports were stuck in BUILD status [3] and the pci_devices table isn't in sync with the port binding [4]. They updated the binding host for these ports and they flipped ACTIVE immediately and everything works for this instance. [1] ~~~ $ mysql -t -h 172.18.0.97 -u root --password=q1w2e3 -D ovs_neutron -e "select a.id,a.status,a.admin_state_up,b.host,b.profile,vif_details,a.network_id from ports a left join ml2_port_bindings b on a.id = b.port_id where a.device_id = '8c3dc669-2032-4d46-9011-7d6bb14fc1c8' order by profile desc;" +--------------------------------------+--------+----------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------+--------------------------------------+ | id | status | admin_state_up | host | profile | vif_details | network_id | +--------------------------------------+--------+----------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------+--------------------------------------+ | 1afbe85e-2d9d-4dd3-8529-5f3ae90e2061 | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:02.1", "physical_network": "provider2", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "2000"} | e6693c26-791e-4d84-aa4f-c2661ce92d13 | | 384d021d-ee74-4494-901a-f3cfd7bc5e55 | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:02.1", "physical_network": "provider2", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "3700"} | 42cc718c-18a4-421b-a89a-3d590a3461fd | | 05519895-011b-407c-80ec-6c3ce5740db9 | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:02.1", "physical_network": "provider2", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "2000"} | e6693c26-791e-4d84-aa4f-c2661ce92d13 | | 2e68fa9e-b912-4161-b3d1-3c63f23c0de4 | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:01.3", "physical_network": "provider2", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "2000"} | e6693c26-791e-4d84-aa4f-c2661ce92d13 | | a5741ad6-1585-4346-88ac-7cf0d9b45045 | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:01.3", "physical_network": "provider2", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "3701"} | 1e2f9790-7c47-4f35-a493-5be2af81725d | | 2cf9de30-f198-4d68-9057-c5abc12013d2 | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:01.0", "physical_network": "provider4", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "2000"} | 14dbac04-af0c-44f8-a32e-7608d4dd9ab4 | | e92d6dde-b57f-411d-819d-56791018e990 | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:01.0", "physical_network": "provider4", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "3702"} | fbafa52a-9b4e-477e-9f03-093cbd2079f3 | | 23e37c39-9ae7-481f-9bcf-7c725f7ae3b4 | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:00.3", "physical_network": "provider4", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "2000"} | 14dbac04-af0c-44f8-a32e-7608d4dd9ab4 | | 42a05ae8-31dc-4bf3-bc14-3110c3c9e6ba | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:00.3", "physical_network": "provider4", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "3700"} | 445cb50a-61e5-407c-a78a-fc8b6b62330e | | 42d5c179-e964-41ef-baf5-2a3571d0a875 | DOWN | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:00.3", "physical_network": "provider4", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "2000"} | 14dbac04-af0c-44f8-a32e-7608d4dd9ab4 | +--------------------------------------+--------+----------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------+--------------------------------------+ ~~~ [2] ~~~ $ mysql -N -s -h 172.18.0.97 -u root --password=q1w2e3 -D nova_api -e "select spec from request_specs where instance_uuid = '8c3dc669-2032-4d46-9011-7d6bb14fc1c8'" | sed 's/\\\\/\\/g' | jq -C '."nova_object.data".pci_requests."nova_object.data".requests[] | select(."nova_object.name"=="InstancePCIRequest") | ."nova_object.data".spec[].physical_network' "provider2" "provider4" "provider2" "provider4" "provider2" "provider4" "provider2" "provider4" "provider2" "provider4" ~~~ [3] ~~~ $ mysql -t -h 172.18.0.97 -u root --password=q1w2e3 -D ovs_neutron -e "select a.id,a.status,a.admin_state_up,b.host,b.profile,vif_details,a.network_id from ports a left join ml2_port_bindings b on a.id = b.port_id where a.device_id = 'e1c2cd11-7acb-4147-9ec9-ce266620cf38' order by profile desc;" | sed 's/\.oss.timbrasil.com.br/.customer.com/g' +--------------------------------------+--------+----------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------+--------------------------------------+ | id | status | admin_state_up | host | profile | vif_details | network_id | +--------------------------------------+--------+----------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------+--------------------------------------+ | 8c18b2aa-a64f-4bfe-bc12-d75d37039f0a | BUILD | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:02.1", "physical_network": "provider2", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "2001"} | f9934f0a-1434-4d91-aca0-54f3647e8c33 | | 1c3ac913-820f-4272-8c2f-b68a90226d1b | BUILD | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:02.0", "physical_network": "provider2", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "3703"} | 63da6a54-a50a-4ead-9fcb-683928110aa2 | | e2099bd9-83f9-4a0a-95e7-90b4559f06e2 | BUILD | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:01.7", "physical_network": "provider2", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "2001"} | f9934f0a-1434-4d91-aca0-54f3647e8c33 | | d998f358-8a5e-4e4d-bdfa-ce6a3fc98971 | BUILD | 1 | lab01csrkhw012.customer.com | {"pci_slot": "0000:3b:00.7", "physical_network": "provider4", "trusted": "true", "pci_vendor_info": "15b3:1016"} | {"port_filter": false, "vlan": "3703"} | 5763db35-5728-4549-839f-4ffd4563ad95 | +--------------------------------------+--------+----------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------+--------------------------------------+ ~~~ [4] ~~~ select * from pci_devices where instance_uuid = 'e1c2cd11-7acb-4147-9ec9-ce266620cf38'; +---------------------+---------------------+------------+---------+------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+--------------------------------------+-----------+--------------+--------------------------------------+ | created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr | uuid | +---------------------+---------------------+------------+---------+------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+--------------------------------------+-----------+--------------+--------------------------------------+ | 2020-04-01 05:12:32 | 2020-06-26 21:09:51 | NULL | 0 | 1486 | 49 | 0000:3b:00.4 | 1016 | 15b3 | type-VF | pci_0000_3b_00_4 | label_15b3_1016 | allocated | {"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\"]}"} | e1c2cd11-7acb-4147-9ec9-ce266620cf38 | cae21ab4-a126-48a4-be8d-d7556d274986 | 0 | 0000:3b:00.0 | 81568c2b-50ff-4d23-a9bd-e3732d569baf | | 2020-04-01 05:12:32 | 2020-06-26 21:09:51 | NULL | 0 | 1504 | 49 | 0000:3b:01.2 | 1016 | 15b3 | type-VF | pci_0000_3b_01_2 | label_15b3_1016 | allocated | {"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\"]}"} | e1c2cd11-7acb-4147-9ec9-ce266620cf38 | 38a20c4e-568b-483a-8eaa-d91fc10d0365 | 0 | 0000:3b:00.1 | e340903f-2162-49ab-a15b-e99b25db44b5 | | 2020-04-01 05:12:32 | 2020-06-26 21:09:51 | NULL | 0 | 1507 | 49 | 0000:3b:01.3 | 1016 | 15b3 | type-VF | pci_0000_3b_01_3 | label_15b3_1016 | allocated | {"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\"]}"} | e1c2cd11-7acb-4147-9ec9-ce266620cf38 | 0eab77ea-5ba7-4b29-9e02-1d56d1716397 | 0 | 0000:3b:00.1 | 7c8c5699-f4eb-4fdd-887c-f115475fa75c | | 2020-04-01 05:12:32 | 2020-06-26 21:09:51 | NULL | 0 | 1510 | 49 | 0000:3b:01.4 | 1016 | 15b3 | type-VF | pci_0000_3b_01_4 | label_15b3_1016 | allocated | {"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\"]}"} | e1c2cd11-7acb-4147-9ec9-ce266620cf38 | f26fc071-3e8a-4438-9e58-d4dbbd01568e | 0 | 0000:3b:00.1 | 2cdfb40f-d794-4347-8af9-e6e9fe6f163a | +---------------------+---------------------+------------+---------+------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+--------------------------------------+-----------+--------------+--------------------------------------+ ~~~ i think this has a similar root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1767797 (When unshelving an SR-IOV instance, the binding profile isn't reclaimed or rescheduled, and this might cause PCI-PT conflicts) im not particular surprised. i know that this was broken for cold migrate in the past and host-evacuate is just doing cold migration so 13 porably does not have the fix. it looks quite similar to https://bugs.launchpad.net/nova/+bug/1658070 which was fiked in pike/osp 12 https://review.opendev.org/#/c/466143/ or this https://github.com/openstack/nova/commit/b930336854bffec1bb81b6d67079a4df59e0af19 which was develpoed in queens/osp 13 to resolve https://bugs.launchpad.net/nova/+bug/1703629 (Evacuation fails for instances with PCI devices due to missing migration) or https://bugs.launchpad.net/nova/+bug/1630698 (nova evacuate of instances with sriov ports fails due to use of source device) all of the above should be already fixed in 13 but i guess ill add this to the list of way that sriov is broken. its posible that https://bugs.launchpad.net/nova/+bug/1860555 is related and i do think we fixed a differen iss ue a few release ago that may not be backported to queens but right no im not aware of a specific patch that addresses it. This is still in our backlog but it has also been reported upstream and a repoducer functional test created so updated the bz with both links Let's stick to the newly opened BZ 2002243 for this, everything from comment #5 belongs there. Please stop updating here, it's needlessly confusing, the two situations are not identical. *** Bug 2044754 has been marked as a duplicate of this bug. *** This BZ is almost 2 years old, and is set as urgent/high. Can you provide a status update? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |