Bug 1971518

Summary: Cluster deletion misses trunk ports and loop over until timeout
Product: OpenShift Container Platform Reporter: Martin André <m.andre>
Component: InstallerAssignee: Martin André <m.andre>
Installer sub component: OpenShift on OpenStack QA Contact: Udi Shkalim <ushkalim>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: mburke, rlobillo, ushkalim
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Missing tags for trunks Consequence: the destroy command stuck in a loop until it hits the timeout because it misses the trunks and they cause other resources to not be deleted. Fix: delete trunks for which the tagged port is a parent. Result: the destroy command no longer only relies on trunk tags to know if a trunk should be deleted and can destroy clusters that don't have tagged trunk.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:33:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin André 2021-06-14 09:05:25 UTC
The automated cleanup job running against MOC highlighted an issue with cluster deletion.

Installer fails to identify trunk ports:

    level=debug msg=Exiting deleting openstack trunks
    level=debug msg=goroutine deleteTrunks complete

Then neutron refused to delete port because it's the parent of a trunk port:

    level=debug msg=Deleting Port "f7fbec6a-14cb-4e3d-8e17-74e9a225dd9f" failed with error: Expected HTTP response code [] when accessing [DELETE https://kaizen.massopen.cloud:13696/v2.0/ports/f7fbec6a-14cb-4e3d-8e17-74e9a225dd9f], but got 409 instead
    level=debug msg={"NeutronError": {"message": "Port f7fbec6a-14cb-4e3d-8e17-74e9a225dd9f is currently a parent port for trunk 8c6fb4fa-e011-4326-8679-4716de9f9dd8.", "type": "PortInUseAsTrunkParent", "detail": ""}}
    level=debug msg=Exiting deleting openstack ports

Full logs at:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-cleanup-moc/1404287722297757696/artifacts/cleanup-moc/shiftstack-cleanup/build-log.txt

Comment 1 Martin André 2021-06-14 09:14:21 UTC
The trunk is missing a tag to identify it belongs to the cluster:

moc-ci ❯ openstack network trunk show 6fpj4mqm-868b3-kg4xc-master-trunk-0
+-----------------+--------------------------------------+
| Field           | Value                                |
+-----------------+--------------------------------------+
| admin_state_up  | UP                                   |
| created_at      | 2021-05-25T06:05:38Z                 |
| description     |                                      |
| id              | 8c6fb4fa-e011-4326-8679-4716de9f9dd8 |
| name            | 6fpj4mqm-868b3-kg4xc-master-trunk-0  |
| port_id         | f7fbec6a-14cb-4e3d-8e17-74e9a225dd9f |
| project_id      | 593227d1d5d04cba8847d5b6b742e0a7     |
| revision_number | 0                                    |
| status          | DOWN                                 |
| sub_ports       |                                      |
| tags            | []                                   |
| tenant_id       | 593227d1d5d04cba8847d5b6b742e0a7     |
| updated_at      | 2021-05-25T06:05:38Z                 |
+-----------------+--------------------------------------+

Not sure why this happened, but perhaps we should consider destroying trunks where ports belong to a cluster we destroy even when they are missing a tag?

Comment 3 Martin André 2021-06-14 13:46:05 UTC
It appears this cluster was created from an UPI job, which is missing the port tag:

https://github.com/openshift/installer/blob/e7fea15/upi/openstack/control-plane.yaml#L39

Comment 7 Udi Shkalim 2021-06-29 12:30:43 UTC
Verified on UPI:
[cloud-user@installer-host ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-28-221420   True        False         102m    Cluster version is 4.9.0-0.nightly-2021-06-28-221420

(shiftstack) [cloud-user@installer-host ~]$ openstack network trunk show ostest-qcfxf-worker-trunk-0
+-----------------+--------------------------------------------------------------------------------------------------+
| Field           | Value                                                                                            |
+-----------------+--------------------------------------------------------------------------------------------------+
| admin_state_up  | UP                                                                                               |
| created_at      | 2021-06-29T09:55:26Z                                                                             |
| description     |                                                                                                  |
| id              | 01739a93-c3d0-4b54-8154-1e2f2c8d6f09                                                             |
| name            | ostest-qcfxf-worker-trunk-0                                                                      |
| port_id         | e644887a-e75e-439c-8622-b8be4ce7d8f8                                                             |
| project_id      | 3210dadc4c0e41f1bf8dacd64753ee33                                                                 |
| revision_number | 25                                                                                               |
| status          | ACTIVE                                                                                           |
| sub_ports       | port_id='353d59b3-3504-449f-8771-cf4197be80a7', segmentation_id='368', segmentation_type='vlan'  |
|                 | port_id='c5ecd08f-0d41-457a-941d-373123803753', segmentation_id='434', segmentation_type='vlan'  |
|                 | port_id='b9c0ad3a-e276-4638-b87e-4da3fcecfeaa', segmentation_id='576', segmentation_type='vlan'  |
|                 | port_id='c3a04072-8f92-4230-ae60-9f5ddeefe7c7', segmentation_id='807', segmentation_type='vlan'  |
|                 | port_id='751342b1-5888-40ff-beba-231a1dc92b48', segmentation_id='836', segmentation_type='vlan'  |
|                 | port_id='43640b43-7874-4238-9463-fe476277136c', segmentation_id='897', segmentation_type='vlan'  |
|                 | port_id='86db44a6-0cb9-430c-8417-db64ce2047b5', segmentation_id='898', segmentation_type='vlan'  |
|                 | port_id='3f87c680-3dc9-40f1-bb7c-15820e091262', segmentation_id='1007', segmentation_type='vlan' |
|                 | port_id='d74bcf0f-df44-4335-b9e6-f7db62264608', segmentation_id='1011', segmentation_type='vlan' |
|                 | port_id='1510c2a4-41e8-40af-bc33-933b8b8cf281', segmentation_id='1035', segmentation_type='vlan' |
|                 | port_id='e5913a38-3d7e-408c-8c15-4f348896fc85', segmentation_id='1078', segmentation_type='vlan' |
|                 | port_id='194a6100-8fed-4cfe-bf0f-38327b82ee44', segmentation_id='1152', segmentation_type='vlan' |
|                 | port_id='f7105151-0782-4fda-ac03-8457104dbe68', segmentation_id='1177', segmentation_type='vlan' |
|                 | port_id='4b96e97c-9007-4000-a395-04f800234ed3', segmentation_id='1399', segmentation_type='vlan' |
|                 | port_id='662e5746-5068-4af9-88d2-f87bc726ea85', segmentation_id='1541', segmentation_type='vlan' |
|                 | port_id='6551ee9c-66ea-4723-884c-16247e116c0f', segmentation_id='1716', segmentation_type='vlan' |
|                 | port_id='3dbfb7fd-cb0e-478b-8123-a50153abd647', segmentation_id='1845', segmentation_type='vlan' |
|                 | port_id='00ee1a41-b425-4549-8722-a32bd68ae53b', segmentation_id='1904', segmentation_type='vlan' |
|                 | port_id='2409002b-488c-4f2a-9e51-4170d9272ae7', segmentation_id='1986', segmentation_type='vlan' |
|                 | port_id='ea745a9f-bbb2-4574-9e70-19bed7f8c141', segmentation_id='2020', segmentation_type='vlan' |
|                 | port_id='54eb0574-31bb-48cc-a74f-11b9ad455905', segmentation_id='2181', segmentation_type='vlan' |
|                 | port_id='45f530cb-b058-4b02-bd4b-a7806dc07394', segmentation_id='2209', segmentation_type='vlan' |
|                 | port_id='e2050b5e-ce2f-40aa-be87-2f4889e69bda', segmentation_id='2352', segmentation_type='vlan' |
|                 | port_id='7e03d2d5-cee3-4d0f-b6f0-9d748fcae1bb', segmentation_id='2801', segmentation_type='vlan' |
|                 | port_id='979cd48b-bc70-4bd3-b791-af0eb1c4b9b0', segmentation_id='2827', segmentation_type='vlan' |
|                 | port_id='624c2e21-7b73-47c1-ace9-c63ace693233', segmentation_id='2858', segmentation_type='vlan' |
|                 | port_id='5f7f6210-4c17-4f56-92ae-48999bd76186', segmentation_id='2898', segmentation_type='vlan' |
|                 | port_id='9049af21-953d-46b1-823f-3c21d0c93867', segmentation_id='2922', segmentation_type='vlan' |
|                 | port_id='72404cbb-664a-4312-a613-e7646d677aab', segmentation_id='2939', segmentation_type='vlan' |
|                 | port_id='ae883d36-9e2f-489e-adeb-9faa2c25f84c', segmentation_id='2961', segmentation_type='vlan' |
|                 | port_id='7ac68b95-7656-41a8-9221-ec32234c1c04', segmentation_id='3109', segmentation_type='vlan' |
|                 | port_id='fc8a1779-5141-44aa-a35c-94f2f2c6f8ec', segmentation_id='3277', segmentation_type='vlan' |
|                 | port_id='1755441f-1343-405e-9542-a0a9c70adb88', segmentation_id='3407', segmentation_type='vlan' |
|                 | port_id='3c20dbdd-83f6-4d11-9e2e-3787b0ad792f', segmentation_id='3539', segmentation_type='vlan' |
|                 | port_id='5ec685fa-a048-4944-9ec1-b6a39188fe2d', segmentation_id='3583', segmentation_type='vlan' |
|                 | port_id='43b675ed-10a7-41a0-832c-27505612de46', segmentation_id='3611', segmentation_type='vlan' |
| tags            | []                                                                                               |
| tenant_id       | 3210dadc4c0e41f1bf8dacd64753ee33                                                                 |
| updated_at      | 2021-06-29T11:48:22Z                                                                             |
+-----------------+--------------------------------------------------------------------------------------------------+

(shiftstack) [cloud-user@installer-host ~]$ openshift-install --log-level debug destroy cluster --dir ostest/
DEBUG OpenShift Installer 4.9.0-0.nightly-2021-06-28-221420
.
.
.
INFO Time elapsed: 14m40s

Comment 9 Michael Burke 2021-09-30 21:59:26 UTC
Martin --

Can you take a look at my proposed release note for this BZ? I saw your doc text and made a few changes to match our style. I want to make sure i didn't change the meaning. Thank you in advance.

Michael


* Previouslly, the Openstack network trunks did not contain a tag to identify it belongs to the cluster. As a consequence, cluster deletion misses the trun ports and gets stuck in a loop until the timeout. The cluster deletion now delete trunks for which the tagged port is a parent.

Comment 10 Martin André 2021-10-01 05:44:51 UTC
(In reply to Michael Burke from comment #9)
> Martin --
> 
> Can you take a look at my proposed release note for this BZ? I saw your doc
> text and made a few changes to match our style. I want to make sure i didn't
> change the meaning. Thank you in advance.
> 
> Michael
> 
> 
> * Previouslly, the Openstack network trunks did not contain a tag to
> identify it belongs to the cluster. As a consequence, cluster deletion
> misses the trun ports and gets stuck in a loop until the timeout. The
> cluster deletion now delete trunks for which the tagged port is a parent.

"In certain conditions" is more correct than "Previously" because we didn't change hot trunks are tagged, but now allow the installer to delete untagged trunks that clearly belong to the cluster.

How about the following?

In certain conditions, the Openstack network trunks does not contain a tag to identify it belongs to the cluster. As a consequence, cluster deletion previously missed the trunk ports and got stuck in a loop until the timeout. The cluster deletion now deletes trunks for which the tagged port is a parent.

Comment 12 errata-xmlrpc 2021-10-18 17:33:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759