Bug 1745196
Summary: | AWS installs occasionally fail due to S3 bucket race: [bucket] produced an unexpected new value for was present, but now absent | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | Installer | Assignee: | W. Trevor King <wking> |
Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | unspecified | CC: | adahiya, hongkliu, jialiu, jlebon, kgarriso, lsm5, obulatov, pmuller, shlao, surbania, wwurzbac |
Version: | 4.2.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The AWS Terraform provider vendored by the installer would occasionally race S3's eventual consistency and get confused.
Consequence: Installation would fail with: When applying changes to module.bootstrap.aws_s3_bucket.ignition, provider"
level=error msg="\"aws\" produced an unexpected new value for was present, but now absent."
Fix: The installer has vendored improved AWS Terraform provider code, which now robustly handles S3 eventual consistency.
Result: Installer-provisioned AWS no longer flakes on "unexpected new value for was present, but now absent".
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:11:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1752313 |
Description
W. Trevor King
2019-08-23 20:56:43 UTC
*** Bug 1752355 has been marked as a duplicate of this bug. *** Just saw this today: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1089/pull-ci-openshift-machine-config-operator-master-e2e-aws-upgrade/1990 Confirming, also seen recently in e2e tests: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovs-kubernetes-4.3/29 It is 2% of failures: https://ci-search-ci-search-next.svc.ci.openshift.org/chart?search=produced+an+unexpected+new+value+for+was+present%2C+but+now+absent&maxAge=336h&context=2&type=all *** Bug 1776423 has been marked as a duplicate of this bug. *** some ssh related issues in the latest aws-fips-4.3 run at https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-fips-4.3/588 Lease acquired, installing... Installing from release registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-11-25-153929 level=warning msg="Found override for release image. Please be warned, this is not advised" level=info msg="Consuming Install Config from target directory" level=info msg="Creating infrastructure resources..." level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-00k7xrfx-3fb9c.origin-ci-int-aws.dev.rhcloud.com:6443..." level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get https://api.ci-op-00k7xrfx-3fb9c.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusteroperators: dial tcp 3.214.115.89:6443: connect: connection refused" level=info msg="Pulling debug logs from the bootstrap machine" level=error msg="Attempted to gather debug logs after installation failure: failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: dial tcp 18.207.205.70:22: connect: connection refused" level=fatal msg="Bootstrap failed to complete: waiting for Kubernetes API: context deadline exceeded" Saw this today too: https://prow.svc.ci.openshift.org/log?job=release-openshift-ocp-installer-e2e-aws-ovn-4.3&id=14 This was fixed in 4.5 when we bumped the provider version to 2.49.0 in https://github.com/openshift/installer/pull/3140 which was a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1766691 Ignore comment 15, it is copy/paste mistake. Search the past 7 days' log, https://search.svc.ci.openshift.org/?search=produced+an+unexpected+new+value+for+was+present&maxAge=168h&context=1&type=all, not found similar error. Move this bug to verified. Showed up again https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/23308#1:build-log.txt%3A58 Should I reopen it? as you see from https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/23308#1:build-log.txt%3A49 ` Installing from initial release registry.svc.ci.openshift.org/ocp/release:4.4.0-rc.4` the installer bring used in that job is 4.4.0-rc.4, which doesn't have the fix, we merged the fix only to 4.5(master) So I do not think this bug should be re-opened. Thanks for Abhinav's explanation. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |