Hide Forgot
Description of problem: Version-Release number of the following components: v4.0.0-0.169.0.0-dirty How reproducible: Always Steps to Reproduce: 1. # cat ./demo2/metadata.json {"clusterName":"qe-jialiu","clusterID":"5e9150a6-c27b-4ef7-9470-5209676940d0","aws":{"region":"us-east-2","identifier":[{"openshiftClusterID":"5e9150a6-c27b-4ef7-9470-5209676940d0"},{"kubernetes.io/cluster/qe-jialiu":"owned"}]}} 2. # ./openshift-install destroy cluster --dir ./demo2/ 3. Actual results: The installer would not exist, which is trying to delete some non-existing instances. time="2019-02-11T05:52:08-05:00" level=debug msg="search for and delete matching resources by tag in us-east-2 matching aws.Filter{\"openshiftClusterID\":\"5e9150a6-c27b-4ef7-9470-5209676940d0\"}" time="2019-02-11T05:52:13-05:00" level=debug msg="deleting arn:aws:ec2:us-east-2:301721915996:volume/vol-087350d1e97a7ed17: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: fc68d6d2-8ab9-4939-9555-df1e447ac364" time="2019-02-11T05:52:18-05:00" level=debug msg="deleting arn:aws:ec2:us-east-2:301721915996:natgateway/nat-08696293109405515: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: a79b3e37-4345-4a2e-8218-61c22e05a49f" time="2019-02-11T05:52:21-05:00" level=info msg=Deleted arn="arn:aws:ec2:us-east-2:301721915996:natgateway/nat-09b17f28856f6f079" id=nat-09b17f28856f6f079 time="2019-02-11T05:52:21-05:00" level=info msg=Deleted arn="arn:aws:ec2:us-east-2:301721915996:natgateway/nat-0fd0094e3e953fbfe" id=nat-0fd0094e3e953fbfe time="2019-02-11T05:52:28-05:00" level=debug msg="deleting arn:aws:ec2:us-east-2:301721915996:volume/vol-0b98219c51fdb8f0c: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 30207bff-bb69-4b08-a5c6-a7fd242cefbd" time="2019-02-11T05:52:28-05:00" level=debug msg="search for and delete matching resources by tag in us-east-2 matching aws.Filter{\"kubernetes.io/cluster/qe-jialiu\":\"owned\"}" time="2019-02-11T05:52:34-05:00" level=debug msg="deleting arn:aws:ec2:us-east-2:301721915996:volume/vol-087350d1e97a7ed17: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 81c4b82e-8d17-48a2-9545-9520f8c7f634" time="2019-02-11T05:52:39-05:00" level=debug msg="deleting arn:aws:ec2:us-east-2:301721915996:natgateway/nat-08696293109405515: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 0912d258-cc85-417c-a409-84b2e704c17b" time="2019-02-11T05:52:43-05:00" level=debug msg="deleting arn:aws:ec2:us-east-2:301721915996:volume/vol-0b98219c51fdb8f0c: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 6962f33b-83d0-4802-bb9f-1dc77069b59c" time="2019-02-11T05:52:43-05:00" level=debug msg="search for and delete matching resources by tag in us-east-1 matching aws.Filter{\"openshiftClusterID\":\"5e9150a6-c27b-4ef7-9470-5209676940d0\"}" time="2019-02-11T05:52:43-05:00" level=debug msg="search for and delete matching resources by tag in us-east-1 matching aws.Filter{\"kubernetes.io/cluster/qe-jialiu\":\"owned\"}" time="2019-02-11T05:52:43-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-0d9f7ce1307459951: InvalidInstanceID.NotFound: The instance ID 'i-0d9f7ce1307459951' does not exist\n\tstatus code: 400, request id: 9d818294-6dae-47f2-9a4d-739983038a79" time="2019-02-11T05:52:44-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-0c0ec911b109516ce: InvalidInstanceID.NotFound: The instance ID 'i-0c0ec911b109516ce' does not exist\n\tstatus code: 400, request id: 66e150ed-3bf2-4417-b150-fb5de337ee08" time="2019-02-11T05:52:44-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-09ec554c567d893fa: InvalidInstanceID.NotFound: The instance ID 'i-09ec554c567d893fa' does not exist\n\tstatus code: 400, request id: 3f01ba9c-4044-4be1-ad5b-ae409f858190" time="2019-02-11T05:52:44-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-061363c67f8b27783: InvalidInstanceID.NotFound: The instance ID 'i-061363c67f8b27783' does not exist\n\tstatus code: 400, request id: 802e37ae-f98e-4ee7-989c-d3933f44e1ad" time="2019-02-11T05:52:44-05:00" level=debug msg="search for IAM roles" time="2019-02-11T05:52:45-05:00" level=debug msg="search for IAM users" time="2019-02-11T05:52:56-05:00" level=debug msg="search for and delete matching resources by tag in us-east-2 matching aws.Filter{\"openshiftClusterID\":\"5e9150a6-c27b-4ef7-9470-5209676940d0\"}" time="2019-02-11T05:52:59-05:00" level=info msg=Deleted arn="arn:aws:ec2:us-east-2:301721915996:natgateway/nat-08696293109405515" id=nat-08696293109405515 time="2019-02-11T05:52:59-05:00" level=debug msg="search for and delete matching resources by tag in us-east-2 matching aws.Filter{\"kubernetes.io/cluster/qe-jialiu\":\"owned\"}" time="2019-02-11T05:52:59-05:00" level=debug msg="search for and delete matching resources by tag in us-east-1 matching aws.Filter{\"openshiftClusterID\":\"5e9150a6-c27b-4ef7-9470-5209676940d0\"}" time="2019-02-11T05:52:59-05:00" level=debug msg="search for and delete matching resources by tag in us-east-1 matching aws.Filter{\"kubernetes.io/cluster/qe-jialiu\":\"owned\"}" time="2019-02-11T05:52:59-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-0d9f7ce1307459951: InvalidInstanceID.NotFound: The instance ID 'i-0d9f7ce1307459951' does not exist\n\tstatus code: 400, request id: cc097e11-7f05-48ac-bad7-1042e77c21ca" time="2019-02-11T05:53:00-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-0c0ec911b109516ce: InvalidInstanceID.NotFound: The instance ID 'i-0c0ec911b109516ce' does not exist\n\tstatus code: 400, request id: 18e4ad5a-4b6b-43a0-94fc-b9c7b5fadea5" time="2019-02-11T05:53:00-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-09ec554c567d893fa: InvalidInstanceID.NotFound: The instance ID 'i-09ec554c567d893fa' does not exist\n\tstatus code: 400, request id: 7b655006-be3b-496f-a659-0cbbfe3eb981" time="2019-02-11T05:53:00-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-061363c67f8b27783: InvalidInstanceID.NotFound: The instance ID 'i-061363c67f8b27783' does not exist\n\tstatus code: 400, request id: 75c46335-a0da-442b-a8bc-1015d109044c" time="2019-02-11T05:53:00-05:00" level=debug msg="search for IAM roles" time="2019-02-11T05:53:00-05:00" level=debug msg="search for IAM users" time="2019-02-11T05:53:00-05:00" level=debug msg="search for and delete matching resources by tag in us-east-2 matching aws.Filter{\"openshiftClusterID\":\"5e9150a6-c27b-4ef7-9470-5209676940d0\"}" time="2019-02-11T05:53:00-05:00" level=debug msg="search for and delete matching resources by tag in us-east-2 matching aws.Filter{\"kubernetes.io/cluster/qe-jialiu\":\"owned\"}" time="2019-02-11T05:53:00-05:00" level=debug msg="no deletions from us-east-2, removing client" time="2019-02-11T05:53:00-05:00" level=debug msg="search for and delete matching resources by tag in us-east-1 matching aws.Filter{\"openshiftClusterID\":\"5e9150a6-c27b-4ef7-9470-5209676940d0\"}" time="2019-02-11T05:53:00-05:00" level=debug msg="search for and delete matching resources by tag in us-east-1 matching aws.Filter{\"kubernetes.io/cluster/qe-jialiu\":\"owned\"}" time="2019-02-11T05:53:00-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-0d9f7ce1307459951: InvalidInstanceID.NotFound: The instance ID 'i-0d9f7ce1307459951' does not exist\n\tstatus code: 400, request id: 42a77824-6956-4173-a0c0-4d2051b64971" time="2019-02-11T05:53:00-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-0c0ec911b109516ce: InvalidInstanceID.NotFound: The instance ID 'i-0c0ec911b109516ce' does not exist\n\tstatus code: 400, request id: 25447971-e7cc-4d7b-bae4-c373bdf5ea7f" time="2019-02-11T05:53:01-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-09ec554c567d893fa: InvalidInstanceID.NotFound: The instance ID 'i-09ec554c567d893fa' does not exist\n\tstatus code: 400, request id: be75b8d3-f498-457f-9b05-0f4d371c3b87" time="2019-02-11T05:53:01-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-061363c67f8b27783: InvalidInstanceID.NotFound: The instance ID 'i-061363c67f8b27783' does not exist\n\tstatus code: 400, request id: aa439793-90a7-479c-ab7a-fdb32d2b0c12" time="2019-02-11T05:53:01-05:00" level=debug msg="search for IAM roles" time="2019-02-11T05:53:01-05:00" level=debug msg="search for IAM users" Expected results: Installer should exist, if the searched instances are not found. Additional info: In my testing, my cluster is set up in us-east-2, why the installer is trying to delete some InvalidInstanceID in us-east-1, though I indeed run some testing in "us-east-1" before this testing.
Now this issue disappeared. Lower its severity. Upon my following testing, hit another issue: level=error msg="Error: Error applying plan:" level=error level=error msg="3 errors occurred:" level=error msg="\t* module.iam.aws_iam_instance_profile.worker: 1 error occurred:" level=error msg="\t* aws_iam_instance_profile.worker: Error creating IAM instance profile qe-akostadi-worker-profile: EntityAlreadyExists: Instance Profile qe-akostadi-worker-profile already exists." level=error msg="\tstatus code: 409, request id: 0d0a1849-2f08-11e9-9889-5be3143234f8" level=error level=error level=error msg="\t* module.bootstrap.aws_iam_instance_profile.bootstrap: 1 error occurred:" level=error msg="\t* aws_iam_instance_profile.bootstrap: Error creating IAM instance profile qe-akostadi-bootstrap-profile: EntityAlreadyExists: Instance Profile qe-akostadi-bootstrap-profile already exists." level=error msg="\tstatus code: 409, request id: 0d0c895d-2f08-11e9-a004-b375d31ba96b" level=error level=error level=error msg="\t* module.masters.aws_iam_instance_profile.master: 1 error occurred:" level=error msg="\t* aws_iam_instance_profile.master: Error creating IAM instance profile qe-akostadi-master-profile: EntityAlreadyExists: Instance Profile qe-akostadi-master-profile already exists." level=error msg="\tstatus code: 409, request id: 0d090799-2f08-11e9-899b-516565cdca0e" but using `aws` tool it said that it can't find such instance profile. Seem like this two issues caused by some Terraform cache when talking with aws?
I was using v4.0.0-0.171.0.0-dirty installer hit the issue described in comment 1.
The initial issue would be tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1672374, let us use this bug to tack the issue described in comment 1.
> level=error msg="\t* module.iam.aws_iam_instance_profile.worker: 1 error occurred:" > level=error msg="\t* aws_iam_instance_profile.worker: Error creating IAM instance profile qe-akostadi-worker-profile: EntityAlreadyExists: Instance Profile qe-akostadi-worker-profile already exists." > level=error msg="\tstatus code: 409, request id: 0d0a1849-2f08-11e9-9889-5be3143234f8" Dup of bug 1669274?
> time="2019-02-11T05:52:43-05:00" level=debug msg="deleting arn:aws:ec2:us-east-1:301721915996:instance/i-0d9f7ce1307459951: InvalidInstanceID.NotFound: The instance ID 'i-0d9f7ce1307459951' does not exist\n\tstatus code: 400, request id: 9d818294-6dae-47f2-9a4d-739983038a79" I've filed [1] to address this. [1]: https://github.com/openshift/installer/pull/1250
(In reply to W. Trevor King from comment #6) > > level=error msg="\t* module.iam.aws_iam_instance_profile.worker: 1 error occurred:" > > level=error msg="\t* aws_iam_instance_profile.worker: Error creating IAM instance profile qe-akostadi-worker-profile: EntityAlreadyExists: Instance Profile qe-akostadi-worker-profile already exists." > > level=error msg="\tstatus code: 409, request id: 0d0a1849-2f08-11e9-9889-5be3143234f8" > > Dup of bug 1669274? I do not think so. Here is my thought, in my case, my running instances was terminated by prune script, the prune script only terminate instances, but not remove those dependent resource created by installer, so left these orphan instance profiles. After that, I followed https://access.redhat.com/solutions/3826921 to ask installer help me destroy the cluster. I guess installer can not find out those leftover instance profiles which already become orphan (due to the instances not existing any more).
> I've filed [1] to address this. > > [1]: https://github.com/openshift/installer/pull/1250 This landed in master today. > > Dup of bug 1669274? > > I do not think so. > Here is my thought, in my case, my running instances was terminated by prune script, the prune script only terminate instances, but not remove those dependent resource created by installer, so left these orphan instance profiles. Tracker for this in [1]. > After that, I followed https://access.redhat.com/solutions/3826921 to ask installer help me destroy the cluster. I guess installer can not find out those leftover instance profiles which already become orphan (due to the instances not existing any more). Right. And there's not much we can do about that since AWS doesn't support tagging for instance profiles. So can we close this as a dup of [1]? Or adjust the title and component here to clarify that the issue is a buggy pruning script in account 301721915996? [1]: https://jira.coreos.com/browse/DPP-1130
Okay, seem like the jira issue is the same issue as mine. I am okay with close it.
Looks like I can't close as a dup of Jira, since that's an external tracker. And I haven't found a Bugzilla product for the DPP team. So I've left this on the OCP product, shifted to the Unknown component, and assigned to John.
[1] and follow-ups landed some uniquifying to avoid collisions. That's enough for me to consider this fixed. [2] is in flight in this space to support users who try and clean up after a broken pruner partially deletes their cluster, but I don't think we need that to close this issue. [1]: https://github.com/openshift/installer/pull/1280 [2]: https://github.com/openshift/installer/pull/1268
[1] went out with [2]. [1]: https://github.com/openshift/installer/pull/1280 [2]: https://github.com/openshift/installer/releases/tag/v0.13.0
Verified this bug with v4.0.16-1-dirty installer extracted from 4.0.0-0.nightly-2019-03-06-074438, and PASS. Install two cluster (cluster-1 and cluster-2) with same cluster name. Installer would name all resource using an uniq string including infraID. [root@preserve-jialiu-ansible 20190307]# cat demo1/metadata.json {"clusterName":"qe-jialiu","clusterID":"073cf6c0-126e-45c5-afc7-96ded57458c4","infraID":"qe-jialiu-wcpnw","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu-wcpnw":"owned"},{"openshiftClusterID":"073cf6c0-126e-45c5-afc7-96ded57458c4"}]}} [root@preserve-jialiu-ansible 20190307]# cat demo2/metadata.json {"clusterName":"qe-jialiu","clusterID":"769e5dbc-6b67-486e-ab63-d49e6d14aec6","infraID":"qe-jialiu-hzfxg","aws":{"region":"us-east-2","identifier":[{"kubernetes.io/cluster/qe-jialiu-hzfxg":"owned"},{"openshiftClusterID":"769e5dbc-6b67-486e-ab63-d49e6d14aec6"}]}} Destroy cluster-2, run oc command against cluster-1, cluster-1 is still working well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758