Bug 1672374
Summary: | destroy cluster with other region in AWS | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | jooho lee <jlee> |
Component: | Installer | Assignee: | W. Trevor King <wking> |
Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | crawford, jlee, mifiedle, wking |
Version: | 4.1.0 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: installer destroy attempts to delete resources from a cluster with the same name in us-east-1
Consequence: installer deletes resources that it should not delete and fails when attempting to delete others
Fix: only filter resources to delete based on the openshiftClusterID and not on the cluster name
Result: only resources for the cluster being destroyed are deleted, and the installer does not block on deleting resources from other regions
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:42:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1664187 |
Description
jooho lee
2019-02-04 19:10:43 UTC
Have you specified the AWS_REGION when destroying the cluster? I believe that is necessary. Matthew, can you take a look at this? Ideally, we'd remember in which region we installed the cluster so that destroy doesn't require AWS_REGION. The underlying issue is that the destroyer used to search for all resources that are tagged with the cluster name. The destroyer always has to search through us-east-1 since that is the only way to find resources that are global--as opposed to tied to a region. So, if there were a cluster in us-east-2 and us-east-1 with the same name, then the destroyer would attempt to delete some resources that belonged to the cluster in us-east-1. The destroyer has since been changed to only search for resources that are tagged with the appropriate openshiftClusterID. > Have you specified the AWS_REGION when destroying the cluster? I believe that is necessary.
It is not necessary to specify the region. The region is determined from the metadata.json file.
QE have also hit similar issue in https://bugzilla.redhat.com/show_bug.cgi?id=1674440#c0 > Fixed by https://github.com/openshift/installer/pull/1170.
With that code released with installer 0.12.0, can we close this?
This issue happens with 0.12.0. As matthew said, the metadata.json file has the AWS region information. My question is why the installer tries to check another region also when it destroys the cluster. From my understanding, the installer does not need to check other regions because of the region information from metadata.json file. > From my understanding, the installer does not need to check other regions because of the region information from metadata.json file. As mentioned in comment 7, the destroyer needs to check us-east-1 too for cross-region resources like Route 53 zones. I dunno why your account has NAT gateways in another region matching your cluster name or ID though. Still, we should be able to add a NatGatewayNotFound handler to deleteEC2NATGateway (like [1], but for NAT gateways). [1]: https://github.com/openshift/installer/pull/1250 This bug is not fixed. The installer still attempts to delete resources from us-east-1 that have the tag "kubernetes.io/cluster/<cluster-name>: owned". [1], which just went out with v0.13.0 [2], uses uniquified cluster names when creating resources and tags. So the deleter will still look in us-east-1 as well as the cluster's region for the reasons given in comment 7, but it should no longer be accidentally matching resources belonging to other clusters which had been using the same cluster name. [1]: https://github.com/openshift/installer/pull/1280 [2]: https://github.com/openshift/installer/releases/tag/v0.13.0 Verified this bug with v4.0.16-1-dirty installer extracted from 4.0.0-0.nightly-2019-03-06-074438, and PASS. Install two cluster (cluster-1 and cluster-2) with same cluster name. Installer would name all resource using an uniq string including infraID. [root@preserve-jialiu-ansible 20190307]# cat demo1/metadata.json {"clusterName":"qe-jialiu","clusterID":"073cf6c0-126e-45c5-afc7-96ded57458c4","infraID":"qe-jialiu-wcpnw","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu-wcpnw":"owned"},{"openshiftClusterID":"073cf6c0-126e-45c5-afc7-96ded57458c4"}]}} [root@preserve-jialiu-ansible 20190307]# cat demo2/metadata.json {"clusterName":"qe-jialiu","clusterID":"769e5dbc-6b67-486e-ab63-d49e6d14aec6","infraID":"qe-jialiu-hzfxg","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu-hzfxg":"owned"},{"openshiftClusterID":"769e5dbc-6b67-486e-ab63-d49e6d14aec6"}]}} Destroy cluster-2, run oc command against cluster-1, cluster-1 is still working well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |