Version: $ openshift-install version 4.9 Platform: OpenStack IPI What happened? In our CI, sometimes some clusters fail to be destroyed entirely, some ports are left over and therefore not all resources are removed (networks, subnets, security groups, etc). What did you expect to happen? All resources of a cluster should be destroyed. How to reproduce it (as minimally and precisely as possible)? Unknown and this seems random, so probably difficult to reproduce. Anything else we need to know? Problem observed here: https://storage.googleapis.com/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-cleanup-vexxhost/1443219121062809600/build-log.txt
This is due to https://github.com/openshift/release/pull/20896 - MAPO doesn't seem to create the Neutron ports with tags, and therefore the destroy can't work fine. This is a valid bug for MAPO.
Removing the Triaged keyword because: * the target release value is missing * the QE automation assessment (flag qe_test_coverage) is missing
There are probably a bunch of ways this can happen. As it happens I fixed one of them in upstream CAPO recently, and we'll get that when we move to MAPO: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1063. This should manifest in the logs as some OpenStack failure during server creation. Questions: * Is there any agent in play other than CAPO that might have created ports, e.g. Kuryr? * Regardless of which agent created them, should the installer reasonably be manually deleting ports? I think yes.
The installer currently deletes ports by tag. There are 3 potential problems with this that I can think of: * Misconfiguration may leave untagged or incorrectly tagged ports * Port tagging is not atomic with port creation, so an OpenStack error can leave us with a created but untagged port * The user or some other agent may create a port which is not tagged as expected How about this as a proposed robustification in the installer: * If we are deleting a network we should also delete all the ports in that network whether or not we think we created them. * If we cannot delete a resource due to a 409 and it is possible to determine which conflicting resources are preventing the delete, we should log them.
*** Bug 2107296 has been marked as a duplicate of this bug. ***
Verified in 4.9.0-0.nightly-2022-07-19-151050 with Kuryr on top of OSP 16.2.2. Verification steps: 1. Create a new project and deployment $ oc new-project demo $ oc create deployment --image quay.io/kuryr/demo demo $ openstack network list | grep demo | 5cf2304c-23bb-4168-b9db-be3291679930 | ns/demo-net | 5b49f64b-1d65-4fe2-b96d-da814d9fa6b8 | $ openstack subnet list | grep demo | 5b49f64b-1d65-4fe2-b96d-da814d9fa6b8 | ns/demo-subnet | 5cf2304c-23bb-4168-b9db-be3291679930 | 10.128.136.0/23 | $ openstack port list | grep 5b49f64b | 4bd86a2f-84f3-42a5-bff8-f1bd65f7aa4e | | fa:16:3e:d4:04:11 | ip_address='10.128.137.102', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | ACTIVE | | 8f6584f4-0276-4626-b535-15d6acc72506 | | fa:16:3e:d3:2f:b0 | ip_address='10.128.136.95', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | ACTIVE | | b0f100c4-2399-465a-944d-2ad19ee1b010 | | fa:16:3e:18:ef:84 | ip_address='10.128.137.212', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | ACTIVE | | e5422f1a-a9af-4d53-a967-711b53b3e57d | | fa:16:3e:10:07:0e | ip_address='10.128.136.1', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | ACTIVE | 2. Create new ports in the same demo network $ openstack port create --network ns/demo-net port1 +-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | admin_state_up | UP | | allowed_address_pairs | | | binding_host_id | None | | binding_profile | None | | binding_vif_details | None | | binding_vif_type | None | | binding_vnic_type | normal | | created_at | 2022-07-21T09:23:51Z | | data_plane_status | None | | description | | | device_id | | | device_owner | | | dns_assignment | fqdn='host-10-128-136-173.shiftstack.com.', hostname='host-10-128-136-173', ip_address='10.128.136.173' | | dns_domain | | | dns_name | | | extra_dhcp_opts | | | fixed_ips | ip_address='10.128.136.173', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | | id | b73db095-4160-48f1-9a09-4534b9163635 | | location | cloud='', project.domain_id=, project.domain_name='Default', project.id='4786302a4aa34762991dff413d5af74d', project.name='shiftstack', region_name='regionOne', zone= | | mac_address | fa:16:3e:81:0b:50 | | name | port1 | | network_id | 5cf2304c-23bb-4168-b9db-be3291679930 | | port_security_enabled | True | | project_id | 4786302a4aa34762991dff413d5af74d | | propagate_uplink_status | None | | qos_policy_id | None | | resource_request | None | | revision_number | 1 | | security_group_ids | 33dcd7be-484c-431c-a222-da907d373cb9 | | status | DOWN | | tags | | | trunk_details | None | | updated_at | 2022-07-21T09:23:51Z | +-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ $ openstack port create --network ns/demo-net port2 +-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | admin_state_up | UP | | allowed_address_pairs | | | binding_host_id | None | | binding_profile | None | | binding_vif_details | None | | binding_vif_type | None | | binding_vnic_type | normal | | created_at | 2022-07-21T09:24:18Z | | data_plane_status | None | | description | | | device_id | | | device_owner | | | dns_assignment | fqdn='host-10-128-137-42.shiftstack.com.', hostname='host-10-128-137-42', ip_address='10.128.137.42' | | dns_domain | | | dns_name | | | extra_dhcp_opts | | | fixed_ips | ip_address='10.128.137.42', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | | id | 7587ae3f-bc96-4110-b2cd-2420b37d7835 | | location | cloud='', project.domain_id=, project.domain_name='Default', project.id='4786302a4aa34762991dff413d5af74d', project.name='shiftstack', region_name='regionOne', zone= | | mac_address | fa:16:3e:f2:ad:37 | | name | port2 | | network_id | 5cf2304c-23bb-4168-b9db-be3291679930 | | port_security_enabled | True | | project_id | 4786302a4aa34762991dff413d5af74d | | propagate_uplink_status | None | | qos_policy_id | None | | resource_request | None | | revision_number | 1 | | security_group_ids | 33dcd7be-484c-431c-a222-da907d373cb9 | | status | DOWN | | tags | | | trunk_details | None | | updated_at | 2022-07-21T09:24:18Z | +-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ $ openstack port list | grep 5b49f64b | 4bd86a2f-84f3-42a5-bff8-f1bd65f7aa4e | | fa:16:3e:d4:04:11 | ip_address='10.128.137.102', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | ACTIVE | | 7587ae3f-bc96-4110-b2cd-2420b37d7835 | port2 | fa:16:3e:f2:ad:37 | ip_address='10.128.137.42', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | DOWN | | 8f6584f4-0276-4626-b535-15d6acc72506 | | fa:16:3e:d3:2f:b0 | ip_address='10.128.136.95', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | ACTIVE | | b0f100c4-2399-465a-944d-2ad19ee1b010 | | fa:16:3e:18:ef:84 | ip_address='10.128.137.212', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | ACTIVE | | b73db095-4160-48f1-9a09-4534b9163635 | port1 | fa:16:3e:81:0b:50 | ip_address='10.128.136.173', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | DOWN | | e5422f1a-a9af-4d53-a967-711b53b3e57d | | fa:16:3e:10:07:0e | ip_address='10.128.136.1', subnet_id='5b49f64b-1d65-4fe2-b96d-da814d9fa6b8' | ACTIVE | 3. Destroy the cluster $ openshift-install destroy cluster --dir=ostest [...] INFO Time elapsed: 8m13s 4. Check the resources are removed, including port1 and port2 $ openstack port list | grep 5b49f64b $ $ openstack network list | grep demo $ $ openstack subnet list | grep demo $
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.9.45 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5879