Description of problem: Sometimes CI leaks untagged security groups and subnets (I'm not clear on how). Because we are allowed to remove all resources from within a cluster-owned VPC, we should have *ByVPC walkers to remove these indirectly-owned resources. [1] is an example of teardown that gets stuck on this. Removing that cluster with the code from installer#2214 gives: $ AWS_PROFILE=ci ./vpc-delete-via-installer destroy {"Name":"ci-op-lz6psxgq-60667-kcdnr-vpc","expirationDate":"2019-08-30T09:12+0000","kubernetes.io/cluster/ci-op-lz6psxgq-60667-kcdnr":"owned","name":"ci-op-lz6psxgq-60667-kcdnr","openshift_creationDate":"2019-08-30T05:20:05.181253+00:00"} INFO Deleted arn="arn:aws:ec2:us-east-1:460538899914:vpc/vpc-000d53425754ba1b5" id=vpc-000d53425754ba1b5 subnet=subnet-097641f89b0988267 INFO Deleted arn="arn:aws:ec2:us-east-1:460538899914:vpc/vpc-000d53425754ba1b5" id=vpc-000d53425754ba1b5 subnet=subnet-003f9ca4e80d830d7 INFO Deleted arn="arn:aws:ec2:us-east-1:460538899914:vpc/vpc-000d53425754ba1b5" id=vpc-000d53425754ba1b5 subnet=subnet-0909490297aebc554 INFO Deleted arn="arn:aws:ec2:us-east-1:460538899914:vpc/vpc-000d53425754ba1b5" id=vpc-000d53425754ba1b5 subnet=subnet-0171e9662756b3ee9 INFO Deleted arn="arn:aws:ec2:us-east-1:460538899914:vpc/vpc-000d53425754ba1b5" id=vpc-000d53425754ba1b5 INFO Deleted arn="arn:aws:ec2:us-east-1:460538899914:dhcp-options/dopt-06f969a9588954be7" id=dopt-06f969a9588954be7 showing that the issue was blocking subnets, and that the #2214 allows for succesful deletion of those subnets. [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ipi-deprovision/8147
Verification for this probably looks like: 1. Create an AWS cluster. 2. Remove tags from a subnet inside the cluster. 3. Remove tags from a security group inside the cluster. 4. Run 'openshift-install destroy cluster' and see that it succeeds without getting hung up on the untagged resources.
Sheng points out this getting stuck on: time="2019-09-11T04:06:20-04:00" level=debug msg="deleting EC2 security group sg-036d0959040324247: DependencyViolation: resource sg-036d0959040324247 has a dependent object\n\tstatus code: 400, request id: d3f0a9b0-3b9f-406e-b769-3fc9c1738d33" arn="arn:aws:ec2:ap-northeast-1:301721915996:vpc/vpc-0603e90cd7607e32c" But it works for me: $ openshift-install --dir wking create cluster # manually remove the kubernetes.io/cluster/... tag from a security group. $ aws ec2 describe-security-groups --group-ids sg-016e5d178041ae01d | jq -r '.SecurityGroups[].Tags[]' { "Value": "wking-tqgwc-master-sg", "Key": "Name" } $ openshift-install --dir wking destroy cluster INFO Disassociated instance=i-0fd71078121225b41 name=wking-tqgwc-bootstrap-profile role=wking-tqgwc-bootstrap-role ... INFO Deleted arn="arn:aws:ec2:us-west-2:269733383066:security-group/sg-01f1419b07006ffbc" id=sg-01f1419b07006ffbc ... INFO Deleted arn="arn:aws:ec2:us-west-2:269733383066:vpc/vpc-028809e7b57efd24f" id=vpc-028809e7b57efd24f table=rtb-00e31e33b92f3089e INFO Deleted arn="arn:aws:ec2:us-west-2:269733383066:vpc/vpc-028809e7b57efd24f" id=vpc-028809e7b57efd24f security group=sg-016e5d178041ae01d INFO Deleted arn="arn:aws:ec2:us-west-2:269733383066:vpc/vpc-028809e7b57efd24f" id=vpc-028809e7b57efd24f security group=sg-0dd6861ba70f34a36 ... INFO Deleted arn="arn:aws:ec2:us-west-2:269733383066:dhcp-options/dopt-0ef63583430009721" id=dopt-0ef63583430009721 $ echo $? 0 I'll try again with an installer extracted from registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2019-09-11-012246 and all the subnet/security-group tags removed...
I reproduced by removing kubernetes.io/cluster/... tags from *all* my security groups. Patch submitted to fix this case.
It's Passwd, I verified it with 4.2.0-0.nightly-2019-09-15-221449
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922