+++ This bug was initially created as a clone of Bug #1807036 +++ +++ This bug was initially created as a clone of Bug #1775873 +++ Description of problem: Cluster does not work after a failed cluster with the same name is destroyed Version-Release number of the following components: /openshift-install version ./openshift-install v4.3.0 built from commit a702fd4beb593932067fe1b31f2d911feaa6d93e release image registry.svc.ci.openshift.org/ocp/release@sha256:15132234d0b753aea0af00f5cfff429cf5eca57513e7cce530207b05167a999f How reproducible: 100% Steps to Reproduce: 1. Create cluster #1 # openshift-install create cluster --dir cluster1 Platform : gcp Project ID: openshift-qe Base Domain: qe.gcp.devcluster.openshift.com Cluster Name: yangyang 2. Create cluster #2 # openshift-install create cluster --dir cluster2 Make sure cluster #2 has the name cluster name and base domain with cluster #1 Platform : gcp Project ID: openshift-qe Base Domain: qe.gcp.devcluster.openshift.com Cluster Name: yangyang 3. Make sure cluster #1 works well after installation completes 4. Destroy failed cluster #2 Actual results: Cluster #1 does not work any longer after cluster #2 is destroyed # oc get co Unable to connect to the server: dial tcp: lookup api.yangyang.qe.gcp.devcluster.openshift.com on 10.11.5.19:53: no such host Expected results: Cluster #1 still works after cluster #2 is destroyed Additional info: Please attach logs from ansible-playbook with the -vvv flag --- Additional comment from Scott Dodson on 2019-12-02 10:14:33 EST --- This does not seem like a workflow that a customer is likely to encounter. --- Additional comment from yangyang on 2020-02-19 02:28:25 EST --- Scott Dodson, as it was deferred to 4.4 but with closed state, is it going to be fixed in 4.4? --- Additional comment from Scott Dodson on 2020-02-19 09:14:22 EST --- (In reply to yangyang from comment #2) > Scott Dodson, as it was deferred to 4.4 but with closed state, is it going > to be fixed in 4.4? No, we do not feel that this is working as expected and is not something a customer is likely to do. --- Additional comment from yangyang on 2020-02-19 22:39:35 EST --- Scott Dodson, although it's an edge scenario, it can happen by chance. If a customer create a cluster with a name which is already used by an up and running cluster, the running cluster does not work any longer once customer destroys the failed cluster. From UX perspective, it deserves a fix. I think we can look up the dns and validate the cluster name before creating resources. Thanks --- Additional comment from Scott Dodson on 2020-02-20 11:55:50 EST --- This is not a technically straight forward problem to solve. In AWS we wait to remove resources that cannot be attributed to a unique cluster on removal of those resources which can be attributed directly to a unique cluster which lessens the likelihood of running into this problem. We can make sure that happens on GCP as well but it's not a complete solution and given this is a problem we feel is incredibly unlikely to happen we'll defer this to 4.5. If we enter 4.5 bug burn down without any indication of this happening in the field then we'll close this again at that time and expect that it will remain closed until there's indication that this is actually a problem we see in the field. --- Additional comment from Abhinav Dahiya on 2020-02-20 15:52:32 EST --- Can you include the .openshift_install.log from both the runs ? and include `oc -v=6` after each run from original report. --- Additional comment from yangyang on 2020-02-21 03:41:05 EST --- --- Additional comment from yangyang on 2020-02-21 03:42:07 EST --- --- Additional comment from yangyang on 2020-02-21 03:44:18 EST --- > and include `oc -v=6` after each run from original report. It's not much clear to me. I do not find -v option for oc.
Verified with 4.3.0-0.nightly-2020-03-15-221412 GCP destroy only purges what created by it. level=debug msg="Images: 1 items pending" level=debug msg="Listing DNS Zones" level=debug msg="Private DNS zone not found" level=debug msg="Listing storage buckets" Previous healthy cluster still works after the failed cluster is destroyed. So move it to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0858