Bug 2047741

Summary: openshift-installer intermittent failure on AWS with "Error: Provider produced inconsistent result after apply" when creating the module.masters.aws_network_interface.master[1] resource
Product: OpenShift Container Platform Reporter: Greg Sheremeta <gshereme>
Component: InstallerAssignee: Nobody <nobody>
Installer sub component: openshift-installer QA Contact: Yunfei Jiang <yunjiang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: cblecker, padillon
Version: 4.9Keywords: ServiceDeliveryBlocker
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: there was an eventual consistency issue in the aws-terraform-provider when trying to update newly created network interfaces (nic) Consequence: installs would fail trying to access nic Fix: installer updated to upstream terraform-provider which has fix to respect eventual consistency Result: install does not fail
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-23 19:39:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2047390    

Description Greg Sheremeta 2022-01-28 12:54:18 UTC
$ openshift-install version
4.9.x

Platform: AWS (seen in an OSD e2e CI run)

Please specify:
IPI

What happened?
Error: Provider produced inconsistent result after apply

What did you expect to happen?
Successful install

How to reproduce it (as minimally and precisely as possible)?
It is random and rare. AWS eventual consistency / raciness bug. AWS needs to be having a bad day to reproduce it.

Flow seems to be:
1 Installer creates a thing
2 AWS creates it
3 AWS says it doesn't exist
4 Terrform dies

log excerpt:

time="2022-01-28T06:08:21Z" level=error msg="Error: Provider produced inconsistent result after apply"
time="2022-01-28T06:08:21Z" level=error
time="2022-01-28T06:08:21Z" level=error msg="When applying changes to module.masters.aws_network_interface.master[1],"
time="2022-01-28T06:08:21Z" level=error msg="provider \"registry.terraform.io/-/aws\" produced an unexpected new value for"
time="2022-01-28T06:08:21Z" level=error msg="was present, but now absent."
time="2022-01-28T06:08:21Z" level=error
time="2022-01-28T06:08:21Z" level=error msg="This is a bug in the provider, which should be reported in the provider's own"
time="2022-01-28T06:08:21Z" level=error msg="issue tracker."
time="2022-01-28T06:08:21Z" level=error
time="2022-01-28T06:08:21Z" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change"

Comment 1 Matthew Staebler 2022-01-31 18:54:03 UTC
The upstream fix for this is https://github.com/hashicorp/terraform-provider-aws/commit/4cdfe3e6fea2a79aca7f6600c8ef9990241e58e2.

Comment 5 Patrick Dillon 2022-05-03 17:59:41 UTC
The upstream fix has been incorporated with https://github.com/openshift/installer/pull/5666

Moving to QE for verification.

Comment 10 errata-xmlrpc 2022-08-23 19:39:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069