Bug 1926258 - GCP jobs exhaust zone listing query quota sometimes due to too many initializations of cloud provider in tests
Summary: GCP jobs exhaust zone listing query quota sometimes due to too many initializ...
Keywords:
Status: CLOSED DUPLICATE of bug 1925740
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Test Infrastructure
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.7.0
Assignee: Clayton Coleman
QA Contact:
URL:
Whiteboard:
Depends On: 1920221
Blocks: 1925740
TreeView+ depends on / blocked
 
Reported: 2021-02-08 14:12 UTC by Clayton Coleman
Modified: 2021-02-10 14:49 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1920221
Environment:
Last Closed: 2021-02-10 14:49:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Clayton Coleman 2021-02-08 14:12:23 UTC
Backporting to 4.7 pre-GA to reduce overall flakiness (and this will go back to 4.6 as well.

+++ This bug was initially created as a clone of Bug #1920221 +++

The e2e tests fork and run child tests in individual processes, which causes cloud provider initialization code to be run once per test, not once per suite as per upstream.  The GCP cloud provider makes several calls to initialize zones and other values that are constant over the life of a test run, and when lots of GCP tests are running at the same time we stand a chance of exceeding the burst quota on our account.  Every week or so we get a big chunk of failures as a result in our CI runs like:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.7/1353763451410845696

W0125 18:41:32.843010    9252 gce.go:485] No network name or URL specified.
E0125 18:41:32.894199    9252 test_context.go:485] Failed to setup provider config for "gce": Error building GCE/GKE provider: unexpected response listing zones: googleapi: Error 403: Quota exceeded for quota group 'ListGroup' and limit 'List requests per 100 seconds' of service 'compute.googleapis.com' for consumer 'project_number:1053217076791'., rateLimitExceeded

https://search.ci.openshift.org/?search=Quota+exceeded+for+quota+group&maxAge=168h&context=1&type=junit&name=4%5C.7&maxMatches=5&maxBytes=20971520&groupBy=job

This fails about 20% of GCP jobs total every week in the conformance suite, which impacts both PRs and release periodics.  This is intermittent.

The ideal fix is to have the cloud provider data seeded via environment and avoid duplicate initialization, which will require us to carry a patch to initialization to extract and reuse the value (during initCloudProvider, probably).  Should be possible to get that upstream in some form, but mitigating the impact quickly is important.

--- Additional comment from OpenShift Automated Release Tooling on 2021-02-07 16:59:26 EST ---

Elliott changed bug status from MODIFIED to ON_QA.

Comment 2 Scott Dodson 2021-02-10 14:49:40 UTC

*** This bug has been marked as a duplicate of bug 1925740 ***


Note You need to log in before you can comment on or make changes to this bug.