Bug 2070744 - openshift-install destroy in us-gov-west-1 results in infinite loop - AWS govcloud
Summary: openshift-install destroy in us-gov-west-1 results in infinite loop - AWS gov...
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.9
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.13.0
Assignee: Aditya Narayanaswamy
QA Contact: Yunfei Jiang
Mike Pytlak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-31 19:58 UTC by Mike Murphy
Modified: 2023-03-20 17:49 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* Previously, uninstalling an AWS cluster that was deployed to the `us-gov-west-1` region failed because AWS resources could not be untagged. This resulted in the process going into an infinite loop, where the installation program tried to untag the resources. This update prevents the retry. As a result, uninstalling the cluster succeeds. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2070744[*BZ#2070744*])
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5995 0 None open Bug 2070744: Fix infinite loop when failing to untag resources 2022-06-13 13:38:00 UTC
Github openshift installer pull 6515 0 None open Bug 2070744: Fix infinite loop error 2022-10-24 15:41:33 UTC
Red Hat Knowledge Base (Solution) 6974761 0 None None None 2022-09-06 09:50:15 UTC

Description Mike Murphy 2022-03-31 19:58:45 UTC
Description of problem:

During destruction of the cluster with the openshift-install binary, it results in an infinite loop with untagging resources for Route53.

INFO untag shared resources: InvalidParameterException: Invocation of UntagResources for this resource is not supported in this region
DEBUG Search for and remove tags in us-gov-west-1 matching kubernetes.io/cluster/test-cluster-bsmt4: shared

Version-Release number of selected component (if applicable):
openshift-install version
./openshift-install 4.9.25

How reproducible:


Steps to Reproduce:
1. Deployment in us-gov-west-1
2. - install-config specifies a hostedZone pointing to a Route53 record that already exists



Actual results:

Stuck in a loop and will not go past trying to untag:

INFO untag shared resources: InvalidParameterException: Invocation of UntagResources for this resource is not supported in this region
DEBUG Search for and remove tags in us-gov-west-1 matching kubernetes.io/cluster/test-cluster-bsmt4: shared


Expected results:

Untag the route53 hosted zone and continue destroying the cluster.

Additional info:

The installer is able to tag the resource fine, but it is unable to destroy the cluster due to hanging up in the untagging of the Route53 hosted zone. We have to manually go in to untag the Route53 hosted zone (or use the AWS CLI) before it can move on with the tear-down of the cluster.

When the hostedZone is specified, the installer always gets stuck in a loop trying to untag the Route53 record. If we don't specify the hostedZone (i.e. the installer creates the hosted zone), it is able to successfully destroy. However, this does not work for customers case, since the hosted zone needs to be created and tied to their internal DNS.

Code snippet from installer:

[1]https://github.com/openshift/installer/blob/beefeacda123ed41ad8f486aa5f7435e2133e8ee/pkg/destroy/aws/aws.go#L731
[2]https://github.com/openshift/installer/blob/beefeacda123ed41ad8f486aa5f7435e2133e8ee/pkg/destroy/aws/aws.go#L184

Comment 1 Mike Murphy 2022-04-01 15:12:01 UTC
As for the openshift-installer, the specific infinite loop is in this block here: https://github.com/openshift/installer/blob/beefeacda123ed41ad8f486aa5f7435e2133e8ee/pkg/destroy/aws/shared.go#L113
On line 113, it gets the InvalidParameterException seen above, which logs as an info (DEBUG) message and continues the loop. This results in the resource never getting untagged, which never gets out of the loop on line 59.

Comment 2 Apoorva Jagtap 2022-04-18 01:00:21 UTC
Hello,

On ticket 03187907, the team has received a response from the AWS support regarding the same issue. As per their analysis this behavior, i.e. untagging resources in ‘us-gov-west-1’ AWS region via SDK failing with the error [0], whereas the untag is successful via AWS CLI is observed due to an already known bug at AWS's end.

However, the team would still like the infinite loop to be resolved in the `openshift-installer` binary. It might be more efficient to report an error instead of the loop going forever.

[0] 
~~~
InvalidParameterException: Invocation of UntagResources for this resource is not supported in this region
~~~

Comment 5 Yunfei Jiang 2022-10-08 03:12:23 UTC
Destroy process still went into an infinite loop:

level=debug msg=listing AWS hosted zones "yunjiang-bz1.qe.devcluster.openshift.com." (page 0) arn=arn:aws-us-gov:route53:::hostedzone/Z10189021N8AASF3CAGVR id=Z10189021N8AASF3CAGVR
level=debug msg=listing AWS hosted zones "qe.devcluster.openshift.com." (page 0) arn=arn:aws-us-gov:route53:::hostedzone/Z10189021N8AASF3CAGVR id=Z10189021N8AASF3CAGVR
level=debug msg=listing AWS hosted zones "devcluster.openshift.com." (page 0) arn=arn:aws-us-gov:route53:::hostedzone/Z10189021N8AASF3CAGVR id=Z10189021N8AASF3CAGVR
level=debug msg=listing AWS hosted zones "openshift.com." (page 0) arn=arn:aws-us-gov:route53:::hostedzone/Z10189021N8AASF3CAGVR id=Z10189021N8AASF3CAGVR
level=debug msg=listing AWS hosted zones "com." (page 0) arn=arn:aws-us-gov:route53:::hostedzone/Z10189021N8AASF3CAGVR id=Z10189021N8AASF3CAGVR
level=info msg=Cleaned record sets from hosted zone arn=arn:aws-us-gov:route53:::hostedzone/Z10189021N8AASF3CAGVR id=Z10189021N8AASF3CAGVR
level=debug msg=Nothing to clean for shared ec2 resource arn=arn:aws-us-gov:ec2:us-gov-west-1:225746144451:subnet/subnet-0798181b46953e88f
level=debug msg=Nothing to clean for shared ec2 resource arn=arn:aws-us-gov:ec2:us-gov-west-1:225746144451:subnet/subnet-0bf50e17442d665f1
level=info msg=untag shared resources: InvalidParameterException: Invocation of UntagResources for this resource is not supported in this region
level=debug msg=Search for and remove tags in us-gov-west-1 matching kubernetes.io/cluster/yunjiang-bz1-fb5nr: shared
level=debug msg=Nothing to clean for shared ec2 resource arn=arn:aws-us-gov:ec2:us-gov-west-1:225746144451:vpc/vpc-00295a4ddaf8a691a


OCP Version 4.12.0-0.nightly-2022-09-26-111919

Comment 6 Rex Russell 2022-12-19 02:23:05 UTC
Is this bug still being worked? Can we please get a status? 
Thank you.

Comment 12 Yunfei Jiang 2023-02-08 05:26:53 UTC
verified on 4.13.0-0.nightly-2023-02-07-064924, PASS.


Note You need to log in before you can comment on or make changes to this bug.