Bug 2047741 - openshift-installer intermittent failure on AWS with "Error: Provider produced inconsistent result after apply" when creating the module.masters.aws_network_interface.master[1] resource
Summary: openshift-installer intermittent failure on AWS with "Error: Provider produce...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Nobody
QA Contact: Yunfei Jiang
URL:
Whiteboard:
Depends On:
Blocks: 2047390
TreeView+ depends on / blocked
 
Reported: 2022-01-28 12:54 UTC by Greg Sheremeta
Modified: 2022-08-23 19:39 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: there was an eventual consistency issue in the aws-terraform-provider when trying to update newly created network interfaces (nic) Consequence: installs would fail trying to access nic Fix: installer updated to upstream terraform-provider which has fix to respect eventual consistency Result: install does not fail
Clone Of:
Environment:
Last Closed: 2022-08-23 19:39:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-23 19:39:57 UTC

Description Greg Sheremeta 2022-01-28 12:54:18 UTC
$ openshift-install version
4.9.x

Platform: AWS (seen in an OSD e2e CI run)

Please specify:
IPI

What happened?
Error: Provider produced inconsistent result after apply

What did you expect to happen?
Successful install

How to reproduce it (as minimally and precisely as possible)?
It is random and rare. AWS eventual consistency / raciness bug. AWS needs to be having a bad day to reproduce it.

Flow seems to be:
1 Installer creates a thing
2 AWS creates it
3 AWS says it doesn't exist
4 Terrform dies

log excerpt:

time="2022-01-28T06:08:21Z" level=error msg="Error: Provider produced inconsistent result after apply"
time="2022-01-28T06:08:21Z" level=error
time="2022-01-28T06:08:21Z" level=error msg="When applying changes to module.masters.aws_network_interface.master[1],"
time="2022-01-28T06:08:21Z" level=error msg="provider \"registry.terraform.io/-/aws\" produced an unexpected new value for"
time="2022-01-28T06:08:21Z" level=error msg="was present, but now absent."
time="2022-01-28T06:08:21Z" level=error
time="2022-01-28T06:08:21Z" level=error msg="This is a bug in the provider, which should be reported in the provider's own"
time="2022-01-28T06:08:21Z" level=error msg="issue tracker."
time="2022-01-28T06:08:21Z" level=error
time="2022-01-28T06:08:21Z" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change"

Comment 1 Matthew Staebler 2022-01-31 18:54:03 UTC
The upstream fix for this is https://github.com/hashicorp/terraform-provider-aws/commit/4cdfe3e6fea2a79aca7f6600c8ef9990241e58e2.

Comment 5 Patrick Dillon 2022-05-03 17:59:41 UTC
The upstream fix has been incorporated with https://github.com/openshift/installer/pull/5666

Moving to QE for verification.

Comment 10 errata-xmlrpc 2022-08-23 19:39:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.