1745196 – AWS installs occasionally fail due to S3 bucket race: [bucket] produced an unexpected new value for was present, but now absent

Bug 1745196 - AWS installs occasionally fail due to S3 bucket race: [bucket] produced an unexpected new value for was present, but now absent

Summary: AWS installs occasionally fail due to S3 bucket race: [bucket] produced an un...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.5.0
Assignee:	W. Trevor King
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1752355 1776423 (view as bug list)
Depends On:
Blocks:	1752313
TreeView+	depends on / blocked

Reported:	2019-08-23 20:56 UTC by W. Trevor King
Modified:	2020-07-13 17:11 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The AWS Terraform provider vendored by the installer would occasionally race S3's eventual consistency and get confused. Consequence: Installation would fail with: When applying changes to module.bootstrap.aws_s3_bucket.ignition, provider" level=error msg="\"aws\" produced an unexpected new value for was present, but now absent." Fix: The installer has vendored improved AWS Terraform provider code, which now robustly handles S3 eventual consistency. Result: Installer-provisioned AWS no longer flakes on "unexpected new value for was present, but now absent".
Clone Of:
Environment:
Last Closed:	2020-07-13 17:11:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	terraform-providers terraform-provider-aws issues 9725	'None'	open	aws_s3_bucket_object inconsistently returning "Provider produced inconsistent result after apply"	2020-07-06 20:51:57 UTC
Github	terraform-providers terraform-provider-aws pull 11894	None	closed	resource/aws_s3_bucket: Retry read after creation for 404 status code	2020-07-06 20:51:57 UTC
Red Hat Knowledge Base (Solution)	4562991	None	None	None	2019-11-07 21:34:49 UTC
Red Hat Product Errata	RHBA-2020:2409	None	None	None	2020-07-13 17:11:52 UTC

Description W. Trevor King 2019-08-23 20:56:43 UTC

We have an ~0.3% (3 of the ~1000 .*aws.* jobs we've run in the past 24 hours [1]) rate of hitting errors like [2]:

level=error msg="Error: Provider produced inconsistent result after apply"
level=error
level=error msg="When applying changes to module.bootstrap.aws_s3_bucket.ignition, provider"
level=error msg="\"aws\" produced an unexpected new value for was present, but now absent."
level=error
level=error msg="This is a bug in the provider, which should be reported in the provider's own"
level=error msg="issue tracker."
level=error
level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform" 

It's being tracked upstream in [3], and I have a ticket open with AWS to explain the inconsistency.

[1]: https://ci-search-ci-search-next.svc.ci.openshift.org/chart?name=aws&search=produced%20an%20unexpected%20new%20value%20for%20was%20present,%20but%20now%20absent
[2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/6097
[3]: https://github.com/terraform-providers/terraform-provider-aws/issues/9725

Comment 1 Abhinav Dahiya 2019-09-16 15:58:32 UTC

*** Bug 1752355 has been marked as a duplicate of this bug. ***

Comment 2 Kirsten Garrison 2019-10-08 17:30:40 UTC

Just saw this today:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1089/pull-ci-openshift-machine-config-operator-master-e2e-aws-upgrade/1990

Comment 3 Sergiusz Urbaniak 2019-10-28 14:27:03 UTC

Confirming, also seen recently in e2e tests: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovs-kubernetes-4.3/29

Comment 4 Oleg Bulatov 2019-10-29 14:58:04 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/10032

Comment 5 Oleg Bulatov 2019-10-29 15:00:46 UTC

It is 2% of failures: https://ci-search-ci-search-next.svc.ci.openshift.org/chart?search=produced+an+unexpected+new+value+for+was+present%2C+but+now+absent&maxAge=336h&context=2&type=all

Comment 6 Jeff Peeler 2019-11-04 16:40:57 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.2/343

Comment 7 Jeff Peeler 2019-11-04 16:47:50 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-promote-openshift-machine-os-content-e2e-aws-4.3/2876

Comment 8 Petr Muller 2019-11-20 16:04:36 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-mirrors-4.2/193

Comment 9 Abhinav Dahiya 2019-11-25 16:42:47 UTC

*** Bug 1776423 has been marked as a duplicate of this bug. ***

Comment 10 Lokesh Mandvekar 2019-11-25 18:10:40 UTC

some ssh related issues in the latest aws-fips-4.3 run at https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-fips-4.3/588

Lease acquired, installing...
Installing from release registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-11-25-153929
level=warning msg="Found override for release image. Please be warned, this is not advised"
level=info msg="Consuming Install Config from target directory"
level=info msg="Creating infrastructure resources..."
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-00k7xrfx-3fb9c.origin-ci-int-aws.dev.rhcloud.com:6443..."
level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get https://api.ci-op-00k7xrfx-3fb9c.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusteroperators: dial tcp 3.214.115.89:6443: connect: connection refused"
level=info msg="Pulling debug logs from the bootstrap machine"
level=error msg="Attempted to gather debug logs after installation failure: failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: dial tcp 18.207.205.70:22: connect: connection refused"
level=fatal msg="Bootstrap failed to complete: waiting for Kubernetes API: context deadline exceeded"

Comment 11 Jonathan Lebon 2019-12-04 15:37:54 UTC

Saw this today too: https://prow.svc.ci.openshift.org/log?job=release-openshift-ocp-installer-e2e-aws-ovn-4.3&id=14

Comment 12 Abhinav Dahiya 2020-02-24 21:52:14 UTC

This was fixed in 4.5 when we bumped the provider version to 2.49.0 in https://github.com/openshift/installer/pull/3140 which was a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1766691

Comment 16 Johnny Liu 2020-03-02 10:23:45 UTC

Ignore comment 15, it is copy/paste mistake.

Search the past 7 days' log, https://search.svc.ci.openshift.org/?search=produced+an+unexpected+new+value+for+was+present&maxAge=168h&context=1&type=all, not found similar error. Move this bug to verified.

Comment 17 Hongkai Liu 2020-03-27 14:55:33 UTC

Showed up again https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/23308#1:build-log.txt%3A58

Should I reopen it?

Comment 18 Abhinav Dahiya 2020-03-27 15:56:10 UTC

as you see from https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/23308#1:build-log.txt%3A49

` Installing from initial release registry.svc.ci.openshift.org/ocp/release:4.4.0-rc.4`

the installer bring used in that job is 4.4.0-rc.4, which doesn't have the fix, we merged the fix only to 4.5(master)

So I do not think this bug should be re-opened.

Comment 19 Johnny Liu 2020-03-30 02:13:11 UTC

Thanks for Abhinav's explanation.

Comment 21 errata-xmlrpc 2020-07-13 17:11:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.