Bug 2060617

Summary: IBMCloud destroy DNS regex not strict enough
Product: OpenShift Container Platform Reporter: Andrew Butcher <abutcher>
Component: InstallerAssignee: Nobody <nobody>
Installer sub component: openshift-installer QA Contact: Pedro Amoedo <pamoedom>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: unspecified CC: anarayan, efried, pamoedom
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: There was an error in the regex used to identify the cluster resources specific to the cluster that was being destroyed. Consequence: Resources from other clusters were getting destroyed instead of the intended cluster resources. Fix: Tweaked the regex string to look for only the cluster that needs to be destroyed Result: the right cluster resources are destroyed
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-03 08:55:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2064731    
Attachments:
Description Flags
bar cluster destroy log none

Description Andrew Butcher 2022-03-03 21:41:34 UTC
Created attachment 1864081 [details]
bar cluster destroy log

Version:

quay.io/openshift-release-dev/ocp-release:4.10.0-x86_64

Platform:

IBM Cloud

What happened?

Destroying a cluster with a ClusterName (bar) that is a subset of another cluster's ClusterName (foobar) destroys DNS records for the cluster not being destroyed when both clusters share a base domain.

Regex is here: https://github.com/openshift/installer/blob/fbe6115c11fb5c45606bc3681c80b3c78f980b71/pkg/destroy/ibmcloud/dns.go#L31

Here's a subset of the destroy log for my "bar" cluster. "foobar" is not being destroyed. Full log attached.

time="2022-03-03T21:29:01Z" level=info msg="Deleted DNS record \"api.bar.ibm.hive.openshift.com\""
time="2022-03-03T21:29:01Z" level=info msg="Deleted DNS record \"api.foobar.ibm.hive.openshift.com\""
time="2022-03-03T21:29:01Z" level=info msg="Deleted DNS record \"api-int.bar.ibm.hive.openshift.com\""
time="2022-03-03T21:29:01Z" level=info msg="Deleted DNS record \"api-int.foobar.ibm.hive.openshift.com\""
time="2022-03-03T21:29:01Z" level=info msg="Deleted DNS record \"*.apps.bar.ibm.hive.openshift.com\""
time="2022-03-03T21:29:01Z" level=info msg="Deleted DNS record \"*.apps.foobar.ibm.hive.openshift.com\""

What did you expect to happen?

DNS entries to only be removed for the cluster being destroyed.

How to reproduce it (as minimally and precisely as possible)?

Create one cluster "bar" and another cluster "foobar" within the same basedomain/CIS instance. Destroy the "bar" cluster. "foobar" DNS entries are destroyed.

Anything else we need to know?

If no ClusterName is provided as metadata, ALL records in a CIS instance matching the basedomain will be destroyed as all records will match the regex. This can't happen normally via 'openshift-install destroy' but pointing it out incase it makes sense to produce an error.

Comment 1 Pedro Amoedo 2022-03-04 10:42:12 UTC
Confirmed from QE side that the problem is present in latest "4.11.0-0.nightly-2022-02-27-122819" version.

For testing purposes I've created 2 clusters with names "pamoedo-bz2060617" and "foopamoedo-bz2060617", when destroying "pamoedo-bz2060617" cluster, the other DNS records were also removed unexpectedly:

~~~
03-04 11:32:04.987  level=info msg=Deleted resource group "pamoedo-bz2060617-w8wgq"
03-04 11:32:11.562  level=debug msg=Listing DNS records
03-04 11:32:13.450  level=info msg=Deleted DNS record "api.foopamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com"
03-04 11:32:13.450  level=info msg=Deleted DNS record "api-int.foopamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com"
03-04 11:32:13.450  level=info msg=Deleted DNS record "api-int.pamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com"
03-04 11:32:13.450  level=info msg=Deleted DNS record "api.pamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com"
03-04 11:32:13.451  level=info msg=Deleted DNS record "*.apps.foopamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com"
03-04 11:32:13.451  level=info msg=Deleted DNS record "*.apps.pamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com"
~~~

Thanks Andrew, I'll notify IBM devs and involve them here ASAP.

Best Regards.

Comment 2 Pedro Amoedo 2022-03-04 18:14:57 UTC
[QA pre-merge summary]

~~~
$ ./openshift-install-local version
./openshift-install-local unreleased-master-5675-g8b469ba8e8e811689cbbde21ddc7cd38b4671e21
built from commit 8b469ba8e8e811689cbbde21ddc7cd38b4671e21
release image registry.ci.openshift.org/origin/release:4.10
release architecture amd64

$ ibmcloud cis dns-records <DNS-ID> | grep pamoedo
302bfcecc38ecbc1625a1847e5d1b4ac   api.foopamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com       CNAME   xxx.lb.appdomain.cloud   false     1   
c6f29541c69bb9b9df1f6130b1d0e534   api-int.foopamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com   CNAME   xxx.lb.appdomain.cloud   false     1   
d854f1bd3a4fecfa249feb057e649bfb   api-int.pamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com      CNAME   xxx.lb.appdomain.cloud     false     60   
9c16c055ec07ba5ac6428d031bbe407e   api.pamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com          CNAME   xxx.lb.appdomain.cloud     false     60

INFO Deleted resource group "pamoedo-bz2060617-cr5st" 
DEBUG Listing DNS records                          
INFO Deleted DNS record "api-int.pamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com" 
INFO Deleted DNS record "api.pamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com" 
INFO Deleted DNS record "*.apps.pamoedo-bz2060617.ibmcloud.qe.devcluster.openshift.com"
~~~

*** PASSED ***