Bug 1859153

Summary: [AWS] An IAM error occurred occasionally during the installation phase: Invalid IAM Instance Profile name
Product: OpenShift Container Platform Reporter: Yunfei Jiang <yunjiang>
Component: InstallerAssignee: Rafael Fonseca <rdossant>
Installer sub component: openshift-installer QA Contact: Yunfei Jiang <yunjiang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: eparis, malonso, rdossant
Version: 4.11Keywords: Reopened
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:35:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yunfei Jiang 2020-07-21 10:50:56 UTC
Description of problem:

Error "Invalid IAM Instance Profile name" occurred when installing OCP 4.4.0-0.nightly-2020-07-18-033102

install log:
~~~
level=debug msg="module.dns.aws_route53_record.api_internal: Creation complete after 1m18s [id=Z07284573HERY5FDLQM1G_api-int.cam-tgt-6871a.qe.devcluster.openshift.com_A]"
level=error
level=error msg="Error: Error launching source instance: InvalidParameterValue: Value (cam-tgt-6871a-p8n7t-bootstrap-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name"
level=error msg="\tstatus code: 400, request id: 7ed118dc-2b87-4e7d-94cc-3c2b5e18c990"
level=error
level=error msg="  on ../../../../../tmp/openshift-install-437511450/bootstrap/main.tf line 116, in resource \"aws_instance\" \"bootstrap\":"
level=error msg=" 116: resource \"aws_instance\" \"bootstrap\" {"
level=error
level=error
level=error
level=error msg="Error: Error launching source instance: InvalidParameterValue: Value (cam-tgt-6871a-p8n7t-master-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name"
level=error msg="\tstatus code: 400, request id: fe55a1ca-14c2-42dd-aedf-2cb7bed9dc36"
level=error
level=error msg="  on ../../../../../tmp/openshift-install-437511450/master/main.tf line 93, in resource \"aws_instance\" \"master\":"
level=error msg="  93: resource \"aws_instance\" \"master\" {"
level=error
level=error
level=error
level=error msg="Error: Error launching source instance: InvalidParameterValue: Value (cam-tgt-6871a-p8n7t-master-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name"
level=error msg="\tstatus code: 400, request id: 29a4c4a1-c74e-46dc-ac55-05d8d41de4a8"
level=error
level=error msg="  on ../../../../../tmp/openshift-install-437511450/master/main.tf line 93, in resource \"aws_instance\" \"master\":"
level=error msg="  93: resource \"aws_instance\" \"master\" {"
level=error
level=error
level=error
level=error msg="Error: Error launching source instance: InvalidParameterValue: Value (cam-tgt-6871a-p8n7t-master-profile) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name"
level=error msg="\tstatus code: 400, request id: 9f64a775-503f-49c3-94e9-d742c52b18a5"
level=error
level=error msg="  on ../../../../../tmp/openshift-install-437511450/master/main.tf line 93, in resource \"aws_instance\" \"master\":"
level=error msg="  93: resource \"aws_instance\" \"master\" {"
level=error
level=error
level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"
~~~

Version-Release number of selected component (if applicable):
4.4.0-0.nightly-2020-07-18-033102


How reproducible:
always

Steps to Reproduce:
1. Trigger an IPI install on AWS

Actual results:
Create cluster failed

Expected results:
Create cluster succeed

Comment 1 Yunfei Jiang 2020-07-21 10:56:41 UTC
this bug blocks all 4.4 IPI testing on AWS

Comment 2 Eric Paris 2020-07-21 13:07:13 UTC
I believe this was an AWS outage this morning. Does it still reproduce?

Comment 3 Scott Dodson 2020-07-21 13:24:16 UTC
Please re-open if this reproduces, but this is believed to have been an AWS outage, AWS release jobs have been green since 05:49:22 EDT and had started failing at 03:24:30 EDT.

https://status.aws.amazon.com/

Between 12:02 AM and 2:35 AM PDT AWS customers experienced increased error rates while calling the IAM assume role, get session token and other APIs with the long term credentials. As of 2:35 AM PDT, we are fully recovered and the issue is resolved now. Other AWS services such as AWS CloudFormation whose features require these actions experienced similar impact.

Comment 4 Yunfei Jiang 2020-07-22 02:43:57 UTC
should be an AWS outage, rebuild successfully on 4.4.0-0.nightly-2020-07-18-033102.

thanks.

Comment 6 Rafael Fonseca 2022-06-08 13:46:29 UTC
This could be a result of a race condition when using the resource before it has been created on the AWS side [1] [2].

[1] https://github.com/hashicorp/terraform/issues/15341
[2] https://github.com/hashicorp/terraform-provider-aws/issues/838

Comment 10 Yunfei Jiang 2022-06-23 07:57:12 UTC
The error was not found in recent CI logs.

Comment 11 errata-xmlrpc 2022-08-10 10:35:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069