$ openshift-install version 4.9.x Platform: AWS -- OSD and ROSA, specifically Please specify: IPI What happened? Error: Provider produced inconsistent result after apply What did you expect to happen? Successful install How to reproduce it (as minimally and precisely as possible)? It is random and rare Flow seems to be: 1 Installer creates a thing 2 AWS creates it 3 AWS says it doesn't exist 4 Terrform dies
Since the upgrade of the aws terraform provider used by the installer to v3.31.0, the percentage of CI runs that have failed to due to inconsistent results for aws_vpc_dhcp_options_association resources has increased from <1% to 8-10%. A fix for this was added to the aws terraform provider in version v3.35.0 [1]. Unfortunately, we cannot update the installer to use any version beyond v3.31.0 due to being limited to using v1 of the terraform plugin sdk. [1] https://github.com/hashicorp/terraform-provider-aws/commit/8e0e9c74c82026876c27bded761ae626b5d05cbf
> Since the upgrade of the aws terraform provider used by the installer to v3.31.0, the percentage of CI runs that have failed to due to inconsistent results for aws_vpc_dhcp_options_association resources has increased from <1% to 8-10%. We usually see a different resource in OSD failures: level=info msg=Creating infrastructure resources... level=error level=error msg=Error: Provider produced inconsistent result after apply level=error level=error msg=When applying changes to module.vpc.aws_route_table.private_routes[2], level=error msg=provider "registry.terraform.io/-/aws" produced an unexpected new value for level=error msg=was present, but now absent. I think I've seen other ones, too. Is this the same bug?
Greg, no, this is not the same bug as the consistency problem with aws_route_table resources. Unfortunately, every resource needs its own separate fix. Since you did not specify which resource you were experiencing in the title or description of the BZ, I commandeered this BZ for the aws_vcp_dhcp_options_association resource, which is the most pressing issue for 4.10.
ok, I spawned Bug 2033256 for module.vpc.aws_route_table.private_routes If I see any others, I'll open individual bugs for each resource.
Yanming, anything blocking reviewing this, or is it just lower in the queue? Let us know what questions you have about verifying.
Is this something that could be verified by OSD/ROSA if QE is unable to reproduce?
Checked both QE's CI pipeline and Prow CI [1], no aws_vpc_dhcp_options_association resources creation error. @gshereme, @mstaeble I got some questions: 1. per discussion in comment 1, I do not see aws_vpc_dhcp_options_association related error, why it was reported in this bug? Did I miss any information? 2. per bug description, `Platform: AWS -- OSD and ROSA, specifically`, does it mean it occurs on OSD/ROSA more often? or only occurs on OSD/ROSA, compare to the general installation method, is there any special deployment configurations in OSD? 3. per discussion in comment 1, I see module.vpc.aws_subnet.private_subnet creation errors were mentioned, searching in CI [2], still see lots of such errors, e.g. [3], not sure if this PR is trying to fix this issue [1] https://search.ci.openshift.org/?search=aws_vpc_dhcp_options_association&maxAge=168h&context=7&type=build-log&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job [2] https://search.ci.openshift.org/?search=module.vpc.aws_subnet.private_subnet&maxAge=168h&context=7&type=build-log&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job [3] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-csi-4.10/1486071548644167680
We ocm qes create 20+ OSD/rosa clusters, but can NOT generate the bug.
> Is this something that could be verified by OSD/ROSA if QE is unable to reproduce? No, it's impossible for us to reproduce as well. I think AWS needs to be having a bad day for it to happen.
Greg, many thanks, very helpful information. Per comment 12, comment 13, the issue was not found in OCP CI and OSD/ROSA recently, and it occurs very rarely (comment 0, comment 14), also there is no regression issue found after this PR merged, setting to VERIFIED now. Feel free to re-open it if the error occurs again.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056