Bug 1875511

Summary:	openshift-install destroy cluster fails to delete a network in GCP
Product:	OpenShift Container Platform	Reporter:	Petr Muller <pmuller>
Component:	Installer	Assignee:	aos-install
Installer sub component:	openshift-installer	QA Contact:	To Hung Sze <tsze>
Status:	CLOSED DEFERRED	Docs Contact:
Severity:	medium
Priority:	low	CC:	aaleman, adahiya, bleanhar, tsze, wking, yanyang
Version:	4.5	Keywords:	UpcomingSprint
Target Milestone:	---
Target Release:	4.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-11-02 19:05:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Petr Muller 2020-09-03 16:13:46 UTC

Description of problem:

The DPTP ipi-deprovisioner tool that runs openshift-install destroy cluster [1] gets stuck on deleting a network, accompanied by the following messages:

level=debug msg="Networks: failed to delete network ci-op-sq9x1it6-0df6f-kdt74-network with error: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE: The network resource 'projects/openshift-gce-devel-ci/global/networks/ci-op-sq9x1it6-0df6f-kdt74-network' is already being used by 'projects/openshift-gce-devel-ci/global/firewalls/k8s-a091b5cea9ce44d1589ce122fe0b62bb-http-hc'"

[1] https://github.com/openshift/ci-tools/blob/f977bb476cfacf74b8ecea1df1178a13cfa7a3e3/cmd/ipi-deprovision/ipi-deprovision.sh#L29-L60

Example occurrence: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ipi-deprovision/1301547877104881664#1:build-log.txt%3A444


How reproducible:
~cca 1-2x per week our CI produces something like this and it needs manual intervention

Comment 1 Abhinav Dahiya 2020-09-08 16:53:23 UTC

The health checks are created with random names, and the only way installer can associate them is to lookup which LB -> which machines -> which cluster. So if the machines are gone there is not way for us to re-associate.
Secondly the de-provision script is running on the same cluster multiple times with previously _deleted / left around_ clusters which makes this problem more apparent. There is not good way to circumvent this unless we involve upstream to tag them appropriately.

Will need a lot more work and planning, moving to 4.7

Comment 4 Abhinav Dahiya 2020-10-12 17:47:28 UTC

*** Bug 1801968 has been marked as a duplicate of this bug. ***

Comment 5 To Hung Sze 2020-10-14 14:36:15 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1801968 was closed as duplicate of this.

Comment 7 Abhinav Dahiya 2020-11-02 18:27:13 UTC

https://issues.redhat.com/browse/CORS-1573 should be good enough to also include this fix.

Comment 8 Brenton Leanhardt 2020-11-02 19:05:53 UTC

Thanks.  We'll track the work for this in Jira.

Comment 9 Matthew Staebler 2020-12-11 17:46:07 UTC

*** Bug 1906172 has been marked as a duplicate of this bug. ***