Bug 1674440
Summary: | installer failed to create cluster due to leftover existing IAM instance profile | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Johnny Liu <jialiu> |
Component: | Installer | Assignee: | W. Trevor King <wking> |
Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | high | CC: | crawford, sponnaga, wking |
Version: | 4.1.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:42:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Johnny Liu
2019-02-11 11:05:11 UTC
Now this issue disappeared. Lower its severity. Upon my following testing, hit another issue: level=error msg="Error: Error applying plan:" level=error level=error msg="3 errors occurred:" level=error msg="\t* module.iam.aws_iam_instance_profile.worker: 1 error occurred:" level=error msg="\t* aws_iam_instance_profile.worker: Error creating IAM instance profile qe-akostadi-worker-profile: EntityAlreadyExists: Instance Profile qe-akostadi-worker-profile already exists." level=error msg="\tstatus code: 409, request id: 0d0a1849-2f08-11e9-9889-5be3143234f8" level=error level=error level=error msg="\t* module.bootstrap.aws_iam_instance_profile.bootstrap: 1 error occurred:" level=error msg="\t* aws_iam_instance_profile.bootstrap: Error creating IAM instance profile qe-akostadi-bootstrap-profile: EntityAlreadyExists: Instance Profile qe-akostadi-bootstrap-profile already exists." level=error msg="\tstatus code: 409, request id: 0d0c895d-2f08-11e9-a004-b375d31ba96b" level=error level=error level=error msg="\t* module.masters.aws_iam_instance_profile.master: 1 error occurred:" level=error msg="\t* aws_iam_instance_profile.master: Error creating IAM instance profile qe-akostadi-master-profile: EntityAlreadyExists: Instance Profile qe-akostadi-master-profile already exists." level=error msg="\tstatus code: 409, request id: 0d090799-2f08-11e9-899b-516565cdca0e" but using `aws` tool it said that it can't find such instance profile. Seem like this two issues caused by some Terraform cache when talking with aws? I was using v4.0.0-0.171.0.0-dirty installer hit the issue described in comment 1. The initial issue would be tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1672374, let us use this bug to tack the issue described in comment 1. > level=error msg="\t* module.iam.aws_iam_instance_profile.worker: 1 error occurred:" > level=error msg="\t* aws_iam_instance_profile.worker: Error creating IAM instance profile qe-akostadi-worker-profile: EntityAlreadyExists: Instance Profile qe-akostadi-worker-profile already exists." > level=error msg="\tstatus code: 409, request id: 0d0a1849-2f08-11e9-9889-5be3143234f8" Dup of bug 1669274? > time="2019-02-11T05:52:43-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-0d9f7ce1307459951: InvalidInstanceID.NotFound: The instance ID 'i-0d9f7ce1307459951' does not exist\n\tstatus code: 400, request id: 9d818294-6dae-47f2-9a4d-739983038a79" I've filed [1] to address this. [1]: https://github.com/openshift/installer/pull/1250 (In reply to W. Trevor King from comment #6) > > level=error msg="\t* module.iam.aws_iam_instance_profile.worker: 1 error occurred:" > > level=error msg="\t* aws_iam_instance_profile.worker: Error creating IAM instance profile qe-akostadi-worker-profile: EntityAlreadyExists: Instance Profile qe-akostadi-worker-profile already exists." > > level=error msg="\tstatus code: 409, request id: 0d0a1849-2f08-11e9-9889-5be3143234f8" > > Dup of bug 1669274? I do not think so. Here is my thought, in my case, my running instances was terminated by prune script, the prune script only terminate instances, but not remove those dependent resource created by installer, so left these orphan instance profiles. After that, I followed https://access.redhat.com/solutions/3826921 to ask installer help me destroy the cluster. I guess installer can not find out those leftover instance profiles which already become orphan (due to the instances not existing any more). > I've filed [1] to address this. > > [1]: https://github.com/openshift/installer/pull/1250 This landed in master today. > > Dup of bug 1669274? > > I do not think so. > Here is my thought, in my case, my running instances was terminated by prune script, the prune script only terminate instances, but not remove those dependent resource created by installer, so left these orphan instance profiles. Tracker for this in [1]. > After that, I followed https://access.redhat.com/solutions/3826921 to ask installer help me destroy the cluster. I guess installer can not find out those leftover instance profiles which already become orphan (due to the instances not existing any more). Right. And there's not much we can do about that since AWS doesn't support tagging for instance profiles. So can we close this as a dup of [1]? Or adjust the title and component here to clarify that the issue is a buggy pruning script in account 301721915996? [1]: https://jira.coreos.com/browse/DPP-1130 Okay, seem like the jira issue is the same issue as mine. I am okay with close it. Looks like I can't close as a dup of Jira, since that's an external tracker. And I haven't found a Bugzilla product for the DPP team. So I've left this on the OCP product, shifted to the Unknown component, and assigned to John. [1] and follow-ups landed some uniquifying to avoid collisions. That's enough for me to consider this fixed. [2] is in flight in this space to support users who try and clean up after a broken pruner partially deletes their cluster, but I don't think we need that to close this issue. [1]: https://github.com/openshift/installer/pull/1280 [2]: https://github.com/openshift/installer/pull/1268 [1] went out with [2]. [1]: https://github.com/openshift/installer/pull/1280 [2]: https://github.com/openshift/installer/releases/tag/v0.13.0 Verified this bug with v4.0.16-1-dirty installer extracted from 4.0.0-0.nightly-2019-03-06-074438, and PASS. Install two cluster (cluster-1 and cluster-2) with same cluster name. Installer would name all resource using an uniq string including infraID. [root@preserve-jialiu-ansible 20190307]# cat demo1/metadata.json {"clusterName":"qe-jialiu","clusterID":"073cf6c0-126e-45c5-afc7-96ded57458c4","infraID":"qe-jialiu-wcpnw","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu-wcpnw":"owned"},{"openshiftClusterID":"073cf6c0-126e-45c5-afc7-96ded57458c4"}]}} [root@preserve-jialiu-ansible 20190307]# cat demo2/metadata.json {"clusterName":"qe-jialiu","clusterID":"769e5dbc-6b67-486e-ab63-d49e6d14aec6","infraID":"qe-jialiu-hzfxg","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu-hzfxg":"owned"},{"openshiftClusterID":"769e5dbc-6b67-486e-ab63-d49e6d14aec6"}]}} Destroy cluster-2, run oc command against cluster-1, cluster-1 is still working well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |