Description of problem: When calling `openshift-install destroy cluster` and you have attached a policy to the machine-api IAM role, the installer will hang indefinitely with no output (other components will be deleted, but installer gets stuck on IAM). Version-Release number of selected component (if applicable): $ ./openshift-install version ./openshift-install v4.1.0-201905171742-dirty built from commit 6ba66dbb6c2c53e1901a6d167d1c813bbbf27f4d release image quay.io/openshift-release-dev/ocp-release@sha256:dc67ad5edd91ca48402309fe0629593e5ae3333435ef8d0bc52c2b62ca725021 How reproducible: 100% Steps to Reproduce: 1. Create cluster 2. Attach IAM policy to user 3. Attempt to delete cluster Actual results: No output from the installer after a certain point. Expected results: Even if we can't handle modified IAM users, we should fail after a reasonable amount of retries and show a message. I had the script running overnight for destroy and it never exited. Additional info: Logs: time="2019-05-23T08:21:09-04:00" level=debug msg="search for IAM roles" time="2019-05-23T08:21:09-04:00" level=debug msg="search for IAM users" time="2019-05-23T08:21:10-04:00" level=debug msg="delete IAM roles and users" time="2019-05-23T08:21:11-04:00" level=debug msg="DeleteConflict: Cannot delete entity, must detach all policies first.\n\tstatus code: 409, request id: 3ff7b07c-7d55-11e9-999c-f3a267015376" arn="arn:aws:iam::269733383066:user/mgugino-dev-rxlx7-openshift-machine-api-8jtff"
We will not be changing the infinite looping behavior. I recommend you use the timeout command to limit this as needed. A patch to bubble up the errors as warnings after 5 minutes is in the works. timeout 30m openshift-install destroy cluster
Reproduce this on 4.4. [root@preserve-jialiu-ansible ~]# openshift-install version openshift-install 4.4.0-0.nightly-2020-03-08-235004 built from commit f371355517f9da267c295e11c01cd3dfc54b39d4 release image registry.svc.ci.openshift.org/ocp/release@sha256:edaa0ec52c2d42bbcce96873f40fc343446786fde8b42ef29bfe6964c2f72658 DEBUG OpenShift Installer 4.4.0-0.nightly-2020-03-08-235004 DEBUG Built from commit f371355517f9da267c295e11c01cd3dfc54b39d4 INFO Credentials loaded from the "default" profile in file "/root/.aws/credentials" DEBUG search for and delete matching instances by tag matching aws.Filter{"kubernetes.io/cluster/geliu450309-whsk6":"owned"} DEBUG Terminated instance=i-00f60cbc4133437b4 DEBUG Terminated instance=i-0a610c9fb41ab452e DEBUG Terminated instance=i-0994045ee8c68335e DEBUG Terminated instance=i-008049cfac85bdff5 DEBUG Terminated instance=i-03015ec27bd9114a7 DEBUG Terminated instance=i-0495fb55c11e67f2c DEBUG search for and delete matching resources by tag in us-east-2 matching aws.Filter{"kubernetes.io/cluster/geliu450309-whsk6":"owned"} INFO Deleted arn="arn:aws:ec2:us-east-2:301721915996:natgateway/nat-0f0179e84f4df3d76" id=nat-0f0179e84f4df3d76 INFO Deleted arn="arn:aws:ec2:us-east-2:301721915996:natgateway/nat-0de988808b6ac202c" id=nat-0de988808b6ac202c INFO Deleted arn="arn:aws:ec2:us-east-2:301721915996:natgateway/nat-0ec620659cabc1d3a" id=nat-0ec620659cabc1d3a DEBUG search for and delete matching resources by tag in us-east-1 matching aws.Filter{"kubernetes.io/cluster/geliu450309-whsk6":"owned"} DEBUG no deletions from us-east-1, removing client DEBUG search for IAM roles DEBUG search for IAM users DEBUG delete IAM roles and users DEBUG DeleteConflict: Cannot delete entity, must detach all policies first. status code: 409, request id: 2540a068-0015-4515-b0a6-cbe7da8b5b64 arn="arn:aws:iam::301721915996:user/geliu450309-whsk6-openshift-machine-api-aws-qrpl2" DEBUG search for and delete matching resources by tag in us-east-2 matching aws.Filter{"kubernetes.io/cluster/geliu450309-whsk6":"owned"} DEBUG no deletions from us-east-2, removing client DEBUG search for IAM roles DEBUG search for IAM users DEBUG delete IAM roles and users DEBUG DeleteConflict: Cannot delete entity, must detach all policies first. status code: 409, request id: 35342b2c-5ae4-4c11-800b-d99ef680e8c0 arn="arn:aws:iam::301721915996:user/geliu450309-whsk6-openshift-machine-api-aws-qrpl2" DEBUG search for IAM roles DEBUG search for IAM users DEBUG delete IAM roles and users DEBUG DeleteConflict: Cannot delete entity, must detach all policies first. status code: 409, request id: ca7e11ae-8d2d-4188-94eb-2e575e189482 arn="arn:aws:iam::301721915996:user/geliu450309-whsk6-openshif Hang here for ever without any WARNING message, only DEBUG message. Verified this bug with 4.5.0-0.nightly-2020-03-06-190457. DEBUG OpenShift Installer 4.5.0-0.nightly-2020-03-06-190457 DEBUG Built from commit c9fb0963e6a058137ca07817cc181d49cf931a59 <--snip--> DEBUG search for IAM roles DEBUG search for IAM users DEBUG delete IAM roles and users DEBUG DeleteConflict: Cannot delete entity, must detach all policies first. status code: 409, request id: d1469848-5ddb-4ceb-9642-b86e1267c677 arn="arn:aws:iam::301721915996:user/geliu450309-whsk6-openshift-machine-api-aws-qrpl2" DEBUG search for IAM roles DEBUG search for IAM users DEBUG delete IAM roles and users DEBUG DeleteConflict: Cannot delete entity, must detach all policies first. status code: 409, request id: 5de6aa5d-92e1-460e-9f35-a529db07bbb4 arn="arn:aws:iam::301721915996:user/geliu450309-whsk6-openshift-machine-api-aws-qrpl2" DEBUG search for IAM roles DEBUG search for IAM users DEBUG delete IAM roles and users DEBUG DeleteConflict: Cannot delete entity, must detach all policies first. status code: 409, request id: ba5f3cb5-13c5-461a-bfd9-78b7878ab109 arn="arn:aws:iam::301721915996:user/geliu450309-whsk6-openshift-machine-api-aws-qrpl2" DEBUG search for IAM roles DEBUG search for IAM users DEBUG delete IAM roles and users DEBUG DeleteConflict: Cannot delete entity, must detach all policies first. status code: 409, request id: fffee2c6-42ed-42bc-bcc9-0a56cb7ca9d9 arn="arn:aws:iam::301721915996:user/geliu450309-whsk6-openshift-machine-api-aws-qrpl2" DEBUG search for IAM roles DEBUG search for IAM users DEBUG delete IAM roles and users DEBUG DeleteConflict: Cannot delete entity, must detach all policies first. status code: 409, request id: 6c5e269c-b5f3-4ac0-9244-f3578c3031b8 arn="arn:aws:iam::301721915996:user/geliu450309-whsk6-openshift-machine-api-aws-qrpl2" DEBUG search for IAM roles DEBUG search for IAM users DEBUG delete IAM roles and users WARNING DeleteConflict: Cannot delete entity, must detach all policies first. status code: 409, request id: a205639a-d138-4dae-905f-cae693686a62 arn="arn:aws:iam::301721915996:user/geliu450309-whsk6-openshift-machine-api-aws-qrpl2" DEBUG search for IAM roles DEBUG search for IAM users Though also hang here for ever, but it is printing out WARNING message.
I ran into a different variation of this bug: I have a 4.5 RC1 GCP IPI cluster with a firewall rule set up (but disabled). Destroy cluster script continuously loop through and display this error (I have debug on): DEBUG Deleting network tszegp-gt4q5-network DEBUG Networks: failed to delete network tszegp-gt4q5-network with error: RESOURCE_IN_USE_BY_ANOTHER_RESOURCE: The network resource 'projects/openshift-qe/global/networks/tszegp-gt4q5-network' is already being used by 'projects/openshift-qe/global/firewalls/tsze-blocks-quayio' After I deleted the firewall rule, script continued to finish.
Current behavior GCP behavior (4.5) when install is kicked off without specifying log-level and a firewall rule is associated with this cluster): INFO Deleted route default-route-2296075b8a62a19a INFO Deleted route default-route-912519603c8403c4 INFO Deleted target pool a3bc35fd3f5aa4401b32a3e466ea060a INFO Deleted target pool tszegcp61820c-76kff-api INFO Deleted backend service tszegcp61820c-76kff-api-internal INFO Deleted subnetwork tszegcp61820c-76kff-master-subnet INFO Deleted subnetwork tszegcp61820c-76kff-worker-subnet INFO Deleted instance group tszegcp61820c-76kff-master-us-central1-b INFO Deleted instance group tszegcp61820c-76kff-master-us-central1-c INFO Deleted instance group tszegcp61820c-76kff-master-us-central1-a INFO Deleted health check tszegcp61820c-76kff-api-internal INFO Deleted HTTP health check a3bc35fd3f5aa4401b32a3e466ea060a INFO Deleted HTTP health check tszegcp61820c-76kff-api (appears to be stuck here) (note: deleted the firewall associated with this cluster separately - installer continues) INFO Deleted network tszegcp61820c-76kff-network INFO Time elapsed: 1h24m25s
We only addressed AWS with this BZ. GCP was handled with an RFE, which has yet to merge. https://github.com/openshift/installer/pull/3749
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409