Bug 1668721 - Destroy fails to complete because of dependency issues
Summary: Destroy fails to complete because of dependency issues
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: Alex Crawford
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks: 1664187
TreeView+ depends on / blocked
 
Reported: 2019-01-23 12:29 UTC by Jaspreet Kaur
Modified: 2019-03-12 14:25 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-05 18:42:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
installer log (561.13 KB, text/plain)
2019-02-20 04:04 UTC, Johnny Liu
no flags Details

Description Jaspreet Kaur 2019-01-23 12:29:22 UTC
Description of problem: when destroying the cluster it fails with below errors:


DEBUG deleting arn:aws:ec2:us-east-2:694280550618:vpc/vpc-0348b3ac86d07d1cf: DependencyViolation: The vpc 'vpc-0348b3ac86d07d1cf' has dependencies and cannot be deleted.
	status code: 400, request id: f7e3b77a-3c0f-4063-ae10-0a62bc9c3414 
DEBUG deleting arn:aws:ec2:us-east-2:694280550618:internet-gateway/igw-065134855fa6f19c1: detaching from vpc-0348b3ac86d07d1cf: DependencyViolation: Network vpc-0348b3ac86d07d1cf has some mapped public address(es). Please unmap those public address(es) before detaching the gateway.
	status code: 400, request id: e8b7a677-b092-46b9-9211-e060fa78e289 
DEBUG deleting arn:aws:ec2:us-east-2:694280550618:elastic-ip/eipalloc-00be2d6348858d148: AuthFailure: You do not have permission to access the specified resource.
	status code: 400, request id: a48e5775-0044-4ed2-9d71-5c413648415e 
DEBUG deleting arn:aws:ec2:us-east-2:694280550618:elastic-ip/eipalloc-0b1df225ed54b0ecb: AuthFailure: You do not have permission to access the specified resource.
	status code: 400, request id: 4239af7a-276e-4e87-996f-428ac6cc2f1c 
DEBUG deleting arn:aws:ec2:us-east-2:694280550618:elastic-ip/eipalloc-0e789832cf7589a0d: AuthFailure: You do not have permission to access the specified resource.
	status code: 400, request id: 80b28cb7-324f-4d18-bf80-27b516ddf1df 
DEBUG search for IAM roles                         
DEBUG search for IAM users                         
DEBUG search for and delete matching resources by tag in us-east-2 matching aws.Filter{"openshiftClusterID":"717f4bd6-0036-40d3-80ac-23d066251474"} 
DEBUG deleting arn:aws:ec2:us-east-2:694280550618:subnet/subnet-0665df434ea6f1a44: DependencyViolation: The subnet 'subnet-0665df434ea6f1a44' has dependencies and cannot be deleted.
	status code: 400, request id: 6ecc09f7-3dc4-44c7-a82b-8024ab9603f8 
DEBUG deleting arn:aws:ec2:us-east-2:694280550618:subnet/subnet-0b6fb36c0a0250e81: DependencyViolation: The subnet 'subnet-0b6fb36c0a0250e81' has dependencies and cannot be deleted.
	status code: 400, request id: f4222bf3-4b31-468a-a278-51c42189b13a 
DEBUG deleting arn:aws:ec2:us-east-2:694280550618:subnet/subnet-068b0c8920f724929: DependencyViolation: The subnet 'subnet-068b0c8920f724929' has dependencies and cannot be deleted.
	status code: 400, request id: 8121df00-9c7b-4574-8c54-7209175c3bb6 
DEBUG deleting arn:aws:ec2:us-east-2:694280550618:vpc/vpc-0348b3ac86d07d1cf: DependencyViolation: The vpc 'vpc-0348b3ac86d07d1cf' has dependencies and cannot be deleted.
	status code: 400, request id: b9769e7a-9d8e-4d73-b859-9c0953ceebf1 
^C


Version-Release number of the following components:

Installer used was "0.10.1" from https://github.com/openshift/installer/releases

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Fails to destroy

Expected results: Should have succeeded

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Alex Crawford 2019-01-23 20:23:25 UTC
It looks like you do not have permission to delete the elastic IPs (which prevents the subnets and then the VPC from being deleted). You'll need to talk to your AWS administrator and get those permissions added to your IAM policy.

Comment 3 W. Trevor King 2019-01-24 12:51:03 UTC
> It looks like you do not have permission to delete the elastic IPs...

For reasons that are not clear to me, AWS returns permission errors when you try and release an EIP that is associated with something that needs an EIP (e.g. a NAT gateway).  For example:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1113/pull-ci-openshift-installer-master-e2e-aws/3126/artifacts/e2e-aws/installer/.openshift_install.log | grep eipalloc-02db5c8102abadf8c
time="2019-01-24T05:06:17Z" level=debug msg="module.vpc.aws_eip.nat_eip[5]: Creation complete after 3s (ID: eipalloc-02db5c8102abadf8c)"
time="2019-01-24T05:07:08Z" level=debug msg="  allocation_id:                                   \"\" => \"eipalloc-02db5c8102abadf8c\""
time="2019-01-24T05:52:19Z" level=debug msg="deleting arn:aws:ec2:us-east-1:460538899914:elastic-ip/eipalloc-02db5c8102abadf8c: AuthFailure: You do not have permission to access the specified resource.\n\tstatus code: 400, request id: f1e84107-60f3-4583-8f6b-34149e169d5a"
time="2019-01-24T05:53:02Z" level=debug msg="deleting arn:aws:ec2:us-east-1:460538899914:elastic-ip/eipalloc-02db5c8102abadf8c: AuthFailure: You do not have permission to access the specified resource.\n\tstatus code: 400, request id: 3fcdc4be-e1c4-4408-8933-a02728fa881b"
time="2019-01-24T05:53:38Z" level=info msg=Released arn="arn:aws:ec2:us-east-1:460538899914:elastic-ip/eipalloc-02db5c8102abadf8c" id=eipalloc-02db5c8102abadf8c

We recenty rerolled deletion, fixing a few bugs [1].  Can you still reproduce?

[1]: https://github.com/openshift/installer/pull/1039

Comment 4 W. Trevor King 2019-01-24 12:55:51 UTC
> Installer used was "0.10.1" from https://github.com/openshift/installer/releases

Ah, I'd miss this, which means you have [1].  Can you let deletion run for at least five minutes and then attach the full .openshift_install.log?  Sometimes it takes AWS a while after a "successful" NAT gateway deletion before you can remove dependent resources.  If it's hanging, you can also poke around in the AWS web console or CLI to try to find the blocking resource that's hanging deletion.

[1]: https://github.com/openshift/installer/pull/1039

Comment 5 Alex Crawford 2019-02-13 21:57:53 UTC
Are you still seeing this defect on newer installer releases?

Comment 6 Alex Crawford 2019-02-18 18:53:45 UTC
Closing due to inactivity.

Comment 7 Xingxing Xia 2019-02-19 14:06:49 UTC
I hit same issue with latest OCP build. I hit it when I unintentionally interrupted `openshift-install create cluster` by Ctrl C:
1. $ openshift-install create cluster --dir install-xxia --log-level=debug
...
? Region us-east-2
...
? Cluster Name [? for help] xxia
...

2. During above command still running, I found one input was wrong, so I pressed Ctrl C
3. Then destroy but the destroy never finished
$ openshift-install destroy cluster --dir install-xxia --log-level=debug

Many days elapsed. Afterwards, any retry of `create cluster` with the same Cluster Name as above would fail and `destroy cluster` always hits the issue of comment 0. (With different Cluster Name, it has no such issue)

Comment 8 Alex Crawford 2019-02-19 17:10:54 UTC
Can you provide the logs from that event?

Comment 9 Johnny Liu 2019-02-20 04:03:14 UTC
I also hit similar issues here.

v4.0.0-0.176.0.0-dirty

Comment 10 Johnny Liu 2019-02-20 04:04:46 UTC
Created attachment 1536568 [details]
installer log

Comment 11 Johnny Liu 2019-02-20 05:06:19 UTC
From EC2 web console, I saw the subnets have some in-use dependency.

# aws ec2 describe-network-interfaces --network-interface-ids eni-062f093bdbe30c1e7
{
    "NetworkInterfaces": [
        {
            "Status": "in-use", 
            "MacAddress": "06:b1:70:ac:ef:14", 
            "SourceDestCheck": false, 
            "AvailabilityZone": "us-east-2b", 
            "Description": "ELB net/qe-jialiu1-ext/10972eda0ed4ba77", 
            "NetworkInterfaceId": "eni-062f093bdbe30c1e7", 
            "VpcId": "vpc-0a1f2c1e1b912356a", 
            "PrivateIpAddresses": [
                {
                    "PrivateDnsName": "ip-10-0-16-138.us-east-2.compute.internal", 
                    "PrivateIpAddress": "10.0.16.138", 
                    "Primary": true, 
                    "Association": {
                        "PublicIp": "18.224.42.131", 
                        "PublicDnsName": "ec2-18-224-42-131.us-east-2.compute.amazonaws.com", 
                        "IpOwnerId": "066548406366"
                    }
                }
            ], 
            "RequesterManaged": true, 
            "PrivateDnsName": "ip-10-0-16-138.us-east-2.compute.internal", 
            "RequesterId": "066548406366", 
            "InterfaceType": "network_load_balancer", 
            "Attachment": {
                "Status": "attached", 
                "DeviceIndex": 1, 
                "DeleteOnTermination": false, 
                "AttachmentId": "ela-attach-83d4dad1", 
                "InstanceOwnerId": "amazon-aws"
            }, 
            "Groups": [], 
            "Ipv6Addresses": [], 
            "OwnerId": "301721915996", 
            "PrivateIpAddress": "10.0.16.138", 
            "SubnetId": "subnet-04ed7feaae8428481", 
            "TagSet": [], 
            "Association": {
                "PublicIp": "18.224.42.131", 
                "PublicDnsName": "ec2-18-224-42-131.us-east-2.compute.amazonaws.com", 
                "IpOwnerId": "066548406366"
            }
        }
    ]
}

# aws ec2 describe-network-interfaces --network-interface-ids eni-029010a961a86a8f6
{
    "NetworkInterfaces": [
        {
            "Status": "in-use", 
            "MacAddress": "06:5d:86:a9:77:b8", 
            "SourceDestCheck": false, 
            "AvailabilityZone": "us-east-2b", 
            "Description": "ELB net/qe-jialiu1-int/ec945d5a356d1226", 
            "NetworkInterfaceId": "eni-029010a961a86a8f6", 
            "VpcId": "vpc-0a1f2c1e1b912356a", 
            "PrivateIpAddresses": [
                {
                    "PrivateDnsName": "ip-10-0-155-227.us-east-2.compute.internal", 
                    "Primary": true, 
                    "PrivateIpAddress": "10.0.155.227"
                }
            ], 
            "RequesterManaged": true, 
            "PrivateDnsName": "ip-10-0-155-227.us-east-2.compute.internal", 
            "RequesterId": "066548406366", 
            "InterfaceType": "network_load_balancer", 
            "Attachment": {
                "Status": "attached", 
                "DeviceIndex": 1, 
                "DeleteOnTermination": false, 
                "AttachmentId": "ela-attach-e1dcd2b3", 
                "InstanceOwnerId": "amazon-aws"
            }, 
            "Groups": [], 
            "Ipv6Addresses": [], 
            "OwnerId": "301721915996", 
            "SubnetId": "subnet-03e0ead3c5cfa6f05", 
            "TagSet": [], 
            "PrivateIpAddress": "10.0.155.227"
        }
    ]
}

But the related LB 'qe-jialiu1-int' and 'qe-jialiu1-ext' are already removed by installer.

After a looong time, about (30 mins), the above in-use network interface finally become available, and the destroy finally get completed.

Comment 12 W. Trevor King 2019-02-20 05:49:11 UTC
(In reply to Johnny Liu from comment #11)
> But the related LB 'qe-jialiu1-int' and 'qe-jialiu1-ext' are already removed
> by installer.
> 
> After a looong time, about (30 mins), the above in-use network interface
> finally become available, and the destroy finally get completed.

I don't think there's anything we can do about that except pester Amazon about processing load-balancer removal more quickly, is there?  If it takes 30 minutes before they notice and allow network-interface removal, then it takes 30 minutes.  I dunno why it took so long in your case though.  Here's a much faster removal from a recent CI run:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1267/pull-ci-openshift-installer-master-e2e-aws/3956/artifacts/e2e-aws/installer/.openshift_install.log | grep 'level=info.*arn=".*\(load\|subnet\)'
time="2019-02-20T01:20:07Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:vpc/vpc-0dd3ced74bdaea6a0" classic load balancer=a17d0524834a911e9b961123d372fa40 id=vpc-0dd3ced74bdaea6a0
time="2019-02-20T01:20:07Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:vpc/vpc-0dd3ced74bdaea6a0" id=vpc-0dd3ced74bdaea6a0 load balancer=loadbalancer/net/ci-op-kh2qw9tw-1d3f3-int/336454f82abf0f05
time="2019-02-20T01:20:07Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:vpc/vpc-0dd3ced74bdaea6a0" id=vpc-0dd3ced74bdaea6a0 load balancer=loadbalancer/net/ci-op-kh2qw9tw-1d3f3-ext/6942a71fa1ae1cf0
time="2019-02-20T01:20:18Z" level=info msg=Deleted arn="arn:aws:elasticloadbalancing:us-east-1:460538899914:loadbalancer/net/ci-op-kh2qw9tw-1d3f3-ext/6942a71fa1ae1cf0" id=net/ci-op-kh2qw9tw-1d3f3-ext/6942a71fa1ae1cf0
time="2019-02-20T01:20:18Z" level=info msg=Deleted arn="arn:aws:elasticloadbalancing:us-east-1:460538899914:loadbalancer/net/ci-op-kh2qw9tw-1d3f3-int/336454f82abf0f05" id=net/ci-op-kh2qw9tw-1d3f3-int/336454f82abf0f05
time="2019-02-20T01:20:19Z" level=info msg=Deleted arn="arn:aws:elasticloadbalancing:us-east-1:460538899914:targetgroup/ci-op-kh2qw9tw-1d3f3-api-ext/9d69319a3d54c8e0" id=ci-op-kh2qw9tw-1d3f3-api-ext/9d69319a3d54c8e0
time="2019-02-20T01:20:19Z" level=info msg=Deleted arn="arn:aws:elasticloadbalancing:us-east-1:460538899914:targetgroup/ci-op-kh2qw9tw-1d3f3-api-int/281fdef88b3801e4" id=ci-op-kh2qw9tw-1d3f3-api-int/281fdef88b3801e4
time="2019-02-20T01:20:19Z" level=info msg=Deleted arn="arn:aws:elasticloadbalancing:us-east-1:460538899914:targetgroup/ci-op-kh2qw9tw-1d3f3-services/674c49af03451c52" id=ci-op-kh2qw9tw-1d3f3-services/674c49af03451c52
time="2019-02-20T01:20:58Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-0c3a7cc7b4931493c" id=subnet-0c3a7cc7b4931493c
time="2019-02-20T01:21:17Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-0711c016f217934b0" id=subnet-0711c016f217934b0
time="2019-02-20T01:21:17Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-0aaf37e9bba39fb5c" id=subnet-0aaf37e9bba39fb5c
time="2019-02-20T01:21:18Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-040efee245c3423b4" id=subnet-040efee245c3423b4
time="2019-02-20T01:21:20Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-0ea539e720682d252" id=subnet-0ea539e720682d252
time="2019-02-20T01:21:20Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-094b0ac04338c2cf6" id=subnet-094b0ac04338c2cf6
time="2019-02-20T01:21:30Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-0f9349c8d0dda9f5f" id=subnet-0f9349c8d0dda9f5f
time="2019-02-20T01:21:30Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-0b3dce80380ff0b29" id=subnet-0b3dce80380ff0b29
time="2019-02-20T01:21:36Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-062b133a47fae75e6" id=subnet-062b133a47fae75e6
time="2019-02-20T01:22:32Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-06e49145fba966022" id=subnet-06e49145fba966022
time="2019-02-20T01:22:32Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-094a5bf819f0749c1" id=subnet-094a5bf819f0749c1
time="2019-02-20T01:22:34Z" level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:subnet/subnet-0dd39c2fcb799fa59" id=subnet-0dd39c2fcb799fa59

Comment 13 W. Trevor King 2019-03-05 18:42:33 UTC
> > After a looong time, about (30 mins), the above in-use network interface
> > finally become available, and the destroy finally get completed.
>
> I don't think there's anything we can do about that...

Doesn't seem like anyone has bright ideas in this direction, so I'm going to close this as NOTABUG.  Deletion works, and goes as fast as Amazon will let us go.


Note You need to log in before you can comment on or make changes to this bug.