Description of problem: I created a cluster in us-east-2 and try to destroy the cluster. From the log, I found installer try to find objects in us-east-1 to delete. The installer never finish. ~~~ DEBUG deleting arn:aws:ec2:us-east-1:694280550618:natgateway/nat-0fc9cf4d52070a706: NatGatewayNotFound: The Nat Gateway nat-0fc9cf4d52070a706 was not found status code: 400, request id: 8224afdb-f9b4-4def-bb92-5cd767548427 ~~~ metadata.json ~~~ {"clusterName":"ocp4","clusterID":"02917b1f-2752-45d2-b511-f3f25762eacb","aws":{"region":"us-east-2","identifier":[{"openshiftClusterID":"02917b1f-2752-45d2-b511-f3f25762eacb"},{"kubernetes.io/cluster/ocp4":"owned"}]}} ~~~ Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. create a cluster in us-east-2 2. delete the cluster 3. Actual results: Try to find some object in us-east-1 Expected results: Do not find any objects in us-east-1 Additional info: Please attach logs from ansible-playbook with the -vvv flag
Have you specified the AWS_REGION when destroying the cluster? I believe that is necessary.
Matthew, can you take a look at this? Ideally, we'd remember in which region we installed the cluster so that destroy doesn't require AWS_REGION.
Fixed by https://github.com/openshift/installer/pull/1170.
The underlying issue is that the destroyer used to search for all resources that are tagged with the cluster name. The destroyer always has to search through us-east-1 since that is the only way to find resources that are global--as opposed to tied to a region. So, if there were a cluster in us-east-2 and us-east-1 with the same name, then the destroyer would attempt to delete some resources that belonged to the cluster in us-east-1. The destroyer has since been changed to only search for resources that are tagged with the appropriate openshiftClusterID.
> Have you specified the AWS_REGION when destroying the cluster? I believe that is necessary. It is not necessary to specify the region. The region is determined from the metadata.json file.
QE have also hit similar issue in https://bugzilla.redhat.com/show_bug.cgi?id=1674440#c0
> Fixed by https://github.com/openshift/installer/pull/1170. With that code released with installer 0.12.0, can we close this?
This issue happens with 0.12.0. As matthew said, the metadata.json file has the AWS region information. My question is why the installer tries to check another region also when it destroys the cluster. From my understanding, the installer does not need to check other regions because of the region information from metadata.json file.
> From my understanding, the installer does not need to check other regions because of the region information from metadata.json file. As mentioned in comment 7, the destroyer needs to check us-east-1 too for cross-region resources like Route 53 zones. I dunno why your account has NAT gateways in another region matching your cluster name or ID though. Still, we should be able to add a NatGatewayNotFound handler to deleteEC2NATGateway (like [1], but for NAT gateways). [1]: https://github.com/openshift/installer/pull/1250
This bug is not fixed. The installer still attempts to delete resources from us-east-1 that have the tag "kubernetes.io/cluster/<cluster-name>: owned".
Tracked by https://jira.coreos.com/projects/CORS/issues/CORS-922
[1], which just went out with v0.13.0 [2], uses uniquified cluster names when creating resources and tags. So the deleter will still look in us-east-1 as well as the cluster's region for the reasons given in comment 7, but it should no longer be accidentally matching resources belonging to other clusters which had been using the same cluster name. [1]: https://github.com/openshift/installer/pull/1280 [2]: https://github.com/openshift/installer/releases/tag/v0.13.0
Verified this bug with v4.0.16-1-dirty installer extracted from 4.0.0-0.nightly-2019-03-06-074438, and PASS. Install two cluster (cluster-1 and cluster-2) with same cluster name. Installer would name all resource using an uniq string including infraID. [root@preserve-jialiu-ansible 20190307]# cat demo1/metadata.json {"clusterName":"qe-jialiu","clusterID":"073cf6c0-126e-45c5-afc7-96ded57458c4","infraID":"qe-jialiu-wcpnw","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu-wcpnw":"owned"},{"openshiftClusterID":"073cf6c0-126e-45c5-afc7-96ded57458c4"}]}} [root@preserve-jialiu-ansible 20190307]# cat demo2/metadata.json {"clusterName":"qe-jialiu","clusterID":"769e5dbc-6b67-486e-ab63-d49e6d14aec6","infraID":"qe-jialiu-hzfxg","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu-hzfxg":"owned"},{"openshiftClusterID":"769e5dbc-6b67-486e-ab63-d49e6d14aec6"}]}} Destroy cluster-2, run oc command against cluster-1, cluster-1 is still working well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758