test: operator.Run template e2e-aws - e2e-aws-calico container setup fails frequently, see job: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-calico-4.5/1302701517358239744 The test fails frequently with ` Error: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4966185 vs. 4194304) `
As far as I know hive has no relation to openshift CI and e2e. Given this says calico I will reassign to networking.
*** Bug 1877117 has been marked as a duplicate of this bug. ***
*** Bug 1877118 has been marked as a duplicate of this bug. ***
This is failing early in the cluster bootstrap... I can't tell who is making the grpc call... it may be Calico, but it looks like it's earlier than when Calico can even run. However their installer may have changed something that is causing our installer to send an oversized message. Can someone on the installer team see if they can work out what is making that call (and why it might be oversized). level=info msg="Consuming Master Machines from target directory" level=info msg="Consuming OpenShift Install (Manifests) from target directory" level=info msg="Consuming Worker Machines from target directory" level=info msg="Credentials loaded from the \"default\" profile in file \"/etc/openshift-installer/.awscred\"" level=info msg="Creating infrastructure resources..." level=error level=error msg="Error: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4968901 vs. 4194304)" level=error level=error level=error msg="Failed to read tfstate: open /tmp/openshift-install-770414117/terraform.tfstate: no such file or directory" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change"
There is an upstream bug on terraform where the grpc plugin doesn't support more than 4mb data transfer... https://github.com/hashicorp/terraform/issues/21709 might be related to that.
This bug is happenning only on e2e aws calico see https://search.ci.openshift.org/?search=Error%3A+rpc+error%3A+code+%3D+ResourceExhausted+desc+%3D+grpc%3A+received+message+larger+than+max&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job What is special about this e2e job that is causing this failure?
My guess is that the bootstrap.ign is becoming too large that the https://github.com/openshift/installer/blob/0d5c871ce7d03f3d03ab4371dc39916a5415cf5c/data/data/aws/bootstrap/main.tf#L27 is failing to create a s3 bucket. that looks like the only large asset. instead of using content, which passes the bytes over the grpc we can maybe use source https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_object#source that reads the ign file from file. maybe that helps with reducing the transfer size.
Working with customer ServiceNow on an vSphere IPI install with Calico EE and are running into the same grpc issue on internal vSphere lab cluster, Tigera lab cluster, and SNOW vSphere deployments
verified. FAILED. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-calico-4.5/1320824972217683968 level=error level=error msg="Error: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5055165 vs. 4194304)" level=error
(In reply to Yunfei Jiang from comment #12) > verified. FAILED. > > https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift- > origin-installer-e2e-aws-calico-4.5/1320824972217683968 > > level=error > level=error msg="Error: rpc error: code = ResourceExhausted desc = grpc: > received message larger than max (5055165 vs. 4194304)" > level=error The fix was merged for master (4.7) and not 4.5 . So please test the latest code for changes.
verified. PASS. @Abhinav, sorry for the mistake.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633