Bug 2028610 - Installer doesn't retry on GCP rate limiting
Summary: Installer doesn't retry on GCP rate limiting
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: aos-install
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks: 2028611
TreeView+ depends on / blocked
 
Reported: 2021-12-02 19:14 UTC by Stephen Benjamin
Modified: 2022-03-10 16:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2028611 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:31:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5417 0 None Merged Bug 2028610: vendor: update terraform-provider-google for rate limit fix 2021-12-02 20:15:06 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:31:55 UTC

Description Stephen Benjamin 2021-12-02 19:14:06 UTC
In CI, we're hitting errors like this:

```
level=error msg=Error: Error when reading or editing Target Pool
"ci-op-x5j99sbj-82914-2f74l-api": googleapi: Error 403: Quota exceeded
for quota group 'ReadGroup' and limit 'Read requests per 100 seconds' of
service 'compute.googleapis.com' for consumer
'project_number:711936183532'., rateLimitExceeded
```

This was fixed in terraform-provider-google v3.62.0, however v3.62.0 uses v2 of the terraform sdk. The installer should be pointed to the openshift fork that contains 3.27.0 + the retry patches.

Comment 2 Scott Dodson 2021-12-02 20:28:54 UTC
This may be difficult to reproduce. TRT has a wealth of data from which they can easily assess whether or not this fix has improved things. They should have enough data tomorrow to reach a conclusion on both effectiveness and whether or not these changes broke something, as such I welcome them to mark the bug VERIFIED once they have that data if QE hasn't been able to verify it independently.

Comment 3 Stephen Benjamin 2021-12-03 14:41:36 UTC
We got 108 runs in the 24 hours since my 4.10 installer PR merged.  GCP is doing a little better compared to the 24 hours before that:

After PR: https://sippy.ci.openshift.org/sippy-ng/jobs/4.10/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22not%22%3Afalse%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%224.10-e2e-gcp%22%7D%2C%7B%22columnField%22%3A%22name%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%224.9%22%7D%2C%7B%22columnField%22%3A%22timestamp%22%2C%22operatorValue%22%3A%22%3E%22%2C%22value%22%3A%221638423780000%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D

Before PR: https://sippy.ci.openshift.org/sippy-ng/jobs/4.10/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22not%22%3Afalse%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%224.10-e2e-gcp%22%7D%2C%7B%22columnField%22%3A%22name%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%224.9%22%7D%2C%7B%22columnField%22%3A%22timestamp%22%2C%22operatorValue%22%3A%22%3C%22%2C%22value%22%3A%221638423780000%22%7D%2C%7B%22columnField%22%3A%22timestamp%22%2C%22operatorValue%22%3A%22%3E%3D%22%2C%22value%22%3A%221638337380000%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D

search.ci also confirms we didn't get any terraform-generated read quota messages from GCP in the last 24 hours:

https://search.ci.openshift.org/chart?search=Error+when+reading.*403.*Quota+exceeded.*Read.*&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Marking this verified based on this data and comment #2.

Comment 6 errata-xmlrpc 2022-03-10 16:31:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.