Bug 1971518 - Cluster deletion misses trunk ports and loop over until timeout
Summary: Cluster deletion misses trunk ports and loop over until timeout
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.9.0
Assignee: Martin André
QA Contact: Udi Shkalim
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-14 09:05 UTC by Martin André
Modified: 2021-10-18 17:34 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Missing tags for trunks Consequence: the destroy command stuck in a loop until it hits the timeout because it misses the trunks and they cause other resources to not be deleted. Fix: delete trunks for which the tagged port is a parent. Result: the destroy command no longer only relies on trunk tags to know if a trunk should be deleted and can destroy clusters that don't have tagged trunk.
Clone Of:
Environment:
Last Closed: 2021-10-18 17:33:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5000 0 None open Bug 1971518: Try deleting associated trunk after port delete failure 2021-06-15 12:49:39 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:34:23 UTC

Description Martin André 2021-06-14 09:05:25 UTC
The automated cleanup job running against MOC highlighted an issue with cluster deletion.

Installer fails to identify trunk ports:

    level=debug msg=Exiting deleting openstack trunks
    level=debug msg=goroutine deleteTrunks complete

Then neutron refused to delete port because it's the parent of a trunk port:

    level=debug msg=Deleting Port "f7fbec6a-14cb-4e3d-8e17-74e9a225dd9f" failed with error: Expected HTTP response code [] when accessing [DELETE https://kaizen.massopen.cloud:13696/v2.0/ports/f7fbec6a-14cb-4e3d-8e17-74e9a225dd9f], but got 409 instead
    level=debug msg={"NeutronError": {"message": "Port f7fbec6a-14cb-4e3d-8e17-74e9a225dd9f is currently a parent port for trunk 8c6fb4fa-e011-4326-8679-4716de9f9dd8.", "type": "PortInUseAsTrunkParent", "detail": ""}}
    level=debug msg=Exiting deleting openstack ports

Full logs at:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-cleanup-moc/1404287722297757696/artifacts/cleanup-moc/shiftstack-cleanup/build-log.txt

Comment 1 Martin André 2021-06-14 09:14:21 UTC
The trunk is missing a tag to identify it belongs to the cluster:

moc-ci ❯ openstack network trunk show 6fpj4mqm-868b3-kg4xc-master-trunk-0
+-----------------+--------------------------------------+
| Field           | Value                                |
+-----------------+--------------------------------------+
| admin_state_up  | UP                                   |
| created_at      | 2021-05-25T06:05:38Z                 |
| description     |                                      |
| id              | 8c6fb4fa-e011-4326-8679-4716de9f9dd8 |
| name            | 6fpj4mqm-868b3-kg4xc-master-trunk-0  |
| port_id         | f7fbec6a-14cb-4e3d-8e17-74e9a225dd9f |
| project_id      | 593227d1d5d04cba8847d5b6b742e0a7     |
| revision_number | 0                                    |
| status          | DOWN                                 |
| sub_ports       |                                      |
| tags            | []                                   |
| tenant_id       | 593227d1d5d04cba8847d5b6b742e0a7     |
| updated_at      | 2021-05-25T06:05:38Z                 |
+-----------------+--------------------------------------+

Not sure why this happened, but perhaps we should consider destroying trunks where ports belong to a cluster we destroy even when they are missing a tag?

Comment 3 Martin André 2021-06-14 13:46:05 UTC
It appears this cluster was created from an UPI job, which is missing the port tag:

https://github.com/openshift/installer/blob/e7fea15/upi/openstack/control-plane.yaml#L39

Comment 7 Udi Shkalim 2021-06-29 12:30:43 UTC
Verified on UPI:
[cloud-user@installer-host ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-28-221420   True        False         102m    Cluster version is 4.9.0-0.nightly-2021-06-28-221420

(shiftstack) [cloud-user@installer-host ~]$ openstack network trunk show ostest-qcfxf-worker-trunk-0
+-----------------+--------------------------------------------------------------------------------------------------+
| Field           | Value                                                                                            |
+-----------------+--------------------------------------------------------------------------------------------------+
| admin_state_up  | UP                                                                                               |
| created_at      | 2021-06-29T09:55:26Z                                                                             |
| description     |                                                                                                  |
| id              | 01739a93-c3d0-4b54-8154-1e2f2c8d6f09                                                             |
| name            | ostest-qcfxf-worker-trunk-0                                                                      |
| port_id         | e644887a-e75e-439c-8622-b8be4ce7d8f8                                                             |
| project_id      | 3210dadc4c0e41f1bf8dacd64753ee33                                                                 |
| revision_number | 25                                                                                               |
| status          | ACTIVE                                                                                           |
| sub_ports       | port_id='353d59b3-3504-449f-8771-cf4197be80a7', segmentation_id='368', segmentation_type='vlan'  |
|                 | port_id='c5ecd08f-0d41-457a-941d-373123803753', segmentation_id='434', segmentation_type='vlan'  |
|                 | port_id='b9c0ad3a-e276-4638-b87e-4da3fcecfeaa', segmentation_id='576', segmentation_type='vlan'  |
|                 | port_id='c3a04072-8f92-4230-ae60-9f5ddeefe7c7', segmentation_id='807', segmentation_type='vlan'  |
|                 | port_id='751342b1-5888-40ff-beba-231a1dc92b48', segmentation_id='836', segmentation_type='vlan'  |
|                 | port_id='43640b43-7874-4238-9463-fe476277136c', segmentation_id='897', segmentation_type='vlan'  |
|                 | port_id='86db44a6-0cb9-430c-8417-db64ce2047b5', segmentation_id='898', segmentation_type='vlan'  |
|                 | port_id='3f87c680-3dc9-40f1-bb7c-15820e091262', segmentation_id='1007', segmentation_type='vlan' |
|                 | port_id='d74bcf0f-df44-4335-b9e6-f7db62264608', segmentation_id='1011', segmentation_type='vlan' |
|                 | port_id='1510c2a4-41e8-40af-bc33-933b8b8cf281', segmentation_id='1035', segmentation_type='vlan' |
|                 | port_id='e5913a38-3d7e-408c-8c15-4f348896fc85', segmentation_id='1078', segmentation_type='vlan' |
|                 | port_id='194a6100-8fed-4cfe-bf0f-38327b82ee44', segmentation_id='1152', segmentation_type='vlan' |
|                 | port_id='f7105151-0782-4fda-ac03-8457104dbe68', segmentation_id='1177', segmentation_type='vlan' |
|                 | port_id='4b96e97c-9007-4000-a395-04f800234ed3', segmentation_id='1399', segmentation_type='vlan' |
|                 | port_id='662e5746-5068-4af9-88d2-f87bc726ea85', segmentation_id='1541', segmentation_type='vlan' |
|                 | port_id='6551ee9c-66ea-4723-884c-16247e116c0f', segmentation_id='1716', segmentation_type='vlan' |
|                 | port_id='3dbfb7fd-cb0e-478b-8123-a50153abd647', segmentation_id='1845', segmentation_type='vlan' |
|                 | port_id='00ee1a41-b425-4549-8722-a32bd68ae53b', segmentation_id='1904', segmentation_type='vlan' |
|                 | port_id='2409002b-488c-4f2a-9e51-4170d9272ae7', segmentation_id='1986', segmentation_type='vlan' |
|                 | port_id='ea745a9f-bbb2-4574-9e70-19bed7f8c141', segmentation_id='2020', segmentation_type='vlan' |
|                 | port_id='54eb0574-31bb-48cc-a74f-11b9ad455905', segmentation_id='2181', segmentation_type='vlan' |
|                 | port_id='45f530cb-b058-4b02-bd4b-a7806dc07394', segmentation_id='2209', segmentation_type='vlan' |
|                 | port_id='e2050b5e-ce2f-40aa-be87-2f4889e69bda', segmentation_id='2352', segmentation_type='vlan' |
|                 | port_id='7e03d2d5-cee3-4d0f-b6f0-9d748fcae1bb', segmentation_id='2801', segmentation_type='vlan' |
|                 | port_id='979cd48b-bc70-4bd3-b791-af0eb1c4b9b0', segmentation_id='2827', segmentation_type='vlan' |
|                 | port_id='624c2e21-7b73-47c1-ace9-c63ace693233', segmentation_id='2858', segmentation_type='vlan' |
|                 | port_id='5f7f6210-4c17-4f56-92ae-48999bd76186', segmentation_id='2898', segmentation_type='vlan' |
|                 | port_id='9049af21-953d-46b1-823f-3c21d0c93867', segmentation_id='2922', segmentation_type='vlan' |
|                 | port_id='72404cbb-664a-4312-a613-e7646d677aab', segmentation_id='2939', segmentation_type='vlan' |
|                 | port_id='ae883d36-9e2f-489e-adeb-9faa2c25f84c', segmentation_id='2961', segmentation_type='vlan' |
|                 | port_id='7ac68b95-7656-41a8-9221-ec32234c1c04', segmentation_id='3109', segmentation_type='vlan' |
|                 | port_id='fc8a1779-5141-44aa-a35c-94f2f2c6f8ec', segmentation_id='3277', segmentation_type='vlan' |
|                 | port_id='1755441f-1343-405e-9542-a0a9c70adb88', segmentation_id='3407', segmentation_type='vlan' |
|                 | port_id='3c20dbdd-83f6-4d11-9e2e-3787b0ad792f', segmentation_id='3539', segmentation_type='vlan' |
|                 | port_id='5ec685fa-a048-4944-9ec1-b6a39188fe2d', segmentation_id='3583', segmentation_type='vlan' |
|                 | port_id='43b675ed-10a7-41a0-832c-27505612de46', segmentation_id='3611', segmentation_type='vlan' |
| tags            | []                                                                                               |
| tenant_id       | 3210dadc4c0e41f1bf8dacd64753ee33                                                                 |
| updated_at      | 2021-06-29T11:48:22Z                                                                             |
+-----------------+--------------------------------------------------------------------------------------------------+

(shiftstack) [cloud-user@installer-host ~]$ openshift-install --log-level debug destroy cluster --dir ostest/
DEBUG OpenShift Installer 4.9.0-0.nightly-2021-06-28-221420
.
.
.
INFO Time elapsed: 14m40s

Comment 9 Michael Burke 2021-09-30 21:59:26 UTC
Martin --

Can you take a look at my proposed release note for this BZ? I saw your doc text and made a few changes to match our style. I want to make sure i didn't change the meaning. Thank you in advance.

Michael


* Previouslly, the Openstack network trunks did not contain a tag to identify it belongs to the cluster. As a consequence, cluster deletion misses the trun ports and gets stuck in a loop until the timeout. The cluster deletion now delete trunks for which the tagged port is a parent.

Comment 10 Martin André 2021-10-01 05:44:51 UTC
(In reply to Michael Burke from comment #9)
> Martin --
> 
> Can you take a look at my proposed release note for this BZ? I saw your doc
> text and made a few changes to match our style. I want to make sure i didn't
> change the meaning. Thank you in advance.
> 
> Michael
> 
> 
> * Previouslly, the Openstack network trunks did not contain a tag to
> identify it belongs to the cluster. As a consequence, cluster deletion
> misses the trun ports and gets stuck in a loop until the timeout. The
> cluster deletion now delete trunks for which the tagged port is a parent.

"In certain conditions" is more correct than "Previously" because we didn't change hot trunks are tagged, but now allow the installer to delete untagged trunks that clearly belong to the cluster.

How about the following?

In certain conditions, the Openstack network trunks does not contain a tag to identify it belongs to the cluster. As a consequence, cluster deletion previously missed the trunk ports and got stuck in a loop until the timeout. The cluster deletion now deletes trunks for which the tagged port is a parent.

Comment 12 errata-xmlrpc 2021-10-18 17:33:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.