Bug 1875511 - openshift-install destroy cluster fails to delete a network in GCP
Summary: openshift-install destroy cluster fails to delete a network in GCP
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.5
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.7.0
Assignee: aos-install
QA Contact: To Hung Sze
URL:
Whiteboard:
: 1801968 1906172 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-03 16:13 UTC by Petr Muller
Modified: 2020-12-11 17:46 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-02 19:05:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Petr Muller 2020-09-03 16:13:46 UTC
Description of problem:

The DPTP ipi-deprovisioner tool that runs openshift-install destroy cluster [1] gets stuck on deleting a network, accompanied by the following messages:

level=debug msg="Networks: failed to delete network ci-op-sq9x1it6-0df6f-kdt74-network with error: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE: The network resource 'projects/openshift-gce-devel-ci/global/networks/ci-op-sq9x1it6-0df6f-kdt74-network' is already being used by 'projects/openshift-gce-devel-ci/global/firewalls/k8s-a091b5cea9ce44d1589ce122fe0b62bb-http-hc'"

[1] https://github.com/openshift/ci-tools/blob/f977bb476cfacf74b8ecea1df1178a13cfa7a3e3/cmd/ipi-deprovision/ipi-deprovision.sh#L29-L60

Example occurrence: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ipi-deprovision/1301547877104881664#1:build-log.txt%3A444


How reproducible:
~cca 1-2x per week our CI produces something like this and it needs manual intervention

Comment 1 Abhinav Dahiya 2020-09-08 16:53:23 UTC
The health checks are created with random names, and the only way installer can associate them is to lookup which LB -> which machines -> which cluster. So if the machines are gone there is not way for us to re-associate.
Secondly the de-provision script is running on the same cluster multiple times with previously _deleted / left around_ clusters which makes this problem more apparent. There is not good way to circumvent this unless we involve upstream to tag them appropriately.

Will need a lot more work and planning, moving to 4.7

Comment 4 Abhinav Dahiya 2020-10-12 17:47:28 UTC
*** Bug 1801968 has been marked as a duplicate of this bug. ***

Comment 5 To Hung Sze 2020-10-14 14:36:15 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1801968 was closed as duplicate of this.

Comment 7 Abhinav Dahiya 2020-11-02 18:27:13 UTC
https://issues.redhat.com/browse/CORS-1573 should be good enough to also include this fix.

Comment 8 Brenton Leanhardt 2020-11-02 19:05:53 UTC
Thanks.  We'll track the work for this in Jira.

Comment 9 Matthew Staebler 2020-12-11 17:46:07 UTC
*** Bug 1906172 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.