I've seen this a couple of times recently when looking at why PR presubmits failed. Checking CI search [1] turns up alarming numbers like: release-openshift-ocp-installer-e2e-aws-fips-4.5 - 41 runs, 27% failed, 36% of failures match Picking a particular example to ground investigation [2]: level=error msg="Error: Unable to find matching route for Route Table (rtb-0bd15d60486c91ec2) and destination CIDR block (0.0.0.0/0)." level=error level=error msg=" on ../tmp/openshift-install-106441498/vpc/vpc-private.tf line 14, in resource \"aws_route\" \"to_nat_gw\":" level=error msg=" 14: resource \"aws_route\" \"to_nat_gw\" {" level=error level=error level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change" And looking at PR presubmits [3]: Across 3943 runs and 269 jobs (64.09% failed), matched 6.49% of failing runs and 25.65% of jobs in 112ms Looks like this is another AWS-eventual-consistency vs. Terraform-provider bugs, which is being tracked upstream in [4], but I don't see an upstream PR yet. Also possible that we could address it by raising the aws_route create timeout [5]. Whatever we do in this space, I'd consider backporting to 4.5. I don't see anything in 4.4 or earlier, so the issue might be due to a 4.4 -> 4.5 Terraform pivot of some sort, although I haven't checked the installer codebase to see what we've done in that space. [1]: https://search.svc.ci.openshift.org/?search=Unable+to+find+matching+route+for+Route+Table&maxAge=168h&context=1&type=junit&name=release-openshift-ocp&groupBy=job [2]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/2273#1:build-log.txt%3A35 [3]: https://search.apps.build01.ci.devcluster.openshift.com/?search=Unable+to+find+matching+route+for+Route+Table&maxAge=168h&context=1&type=junit&name=%5Epull-ci-.*-e2e-aws&maxMatches=5&maxBytes=20971520&groupBy=job [4]: https://github.com/terraform-providers/terraform-provider-aws/issues/13138 [5]: https://www.terraform.io/docs/providers/aws/r/route.html#timeouts
I agree we should backport this. It seems like an upstream patch is being proposed.
Bumping priority a bit, aside from release jobs, this has been showing up a lot for PRs as well for 4.5
This issue has been addressed in PR https://github.com/terraform-providers/terraform-provider-aws/pull/13747, which has been merged. It is available in version 2.67.0 of the terraform-provider-aws plugin, so we will need to update. Coming soon.
PR: https://github.com/openshift/installer/pull/3837
verified. PASS. After the PR3837 merged, no such error occurred on 4.6 (approximately 6 days). mark this bug as VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196