Bug 1923536 - Image pullthrough does not pass 429 errors back to capable clients
Summary: Image pullthrough does not pass 429 errors back to capable clients
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Flavian
QA Contact: XiuJuan Wang
URL:
Whiteboard:
: 1929767 1932643 (view as bug list)
Depends On:
Blocks: 2085414
TreeView+ depends on / blocked
 
Reported: 2021-02-01 17:08 UTC by Clayton Coleman
Modified: 2023-09-15 01:32 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the registry interpreted 429 from upstream registries as if the data is not available. Consequence: temporary errors are hidden by the registry, instead of 429 Too Many Requests clients get 404 Not Found. Fix: proxy 429 errors to clients. Result: capable clients can successfully retry and eventually pull images if upstream registry returned 429.
Clone Of:
Environment:
[sig-imageregistry][Feature:ImageAppend] Image append should create images by appending them
Last Closed: 2022-08-10 10:35:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift image-registry pull 329 0 None open Bug 1923536: forward http.StatusTooManyRequests to client 2022-05-12 08:23:41 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:36:00 UTC

Description Clayton Coleman 2021-02-01 17:08:18 UTC
The image append test fails infrequently (1/30) with 

error: uploading the source layer sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b failed: Patch "https://image-registry.openshift-image-registry.svc:5000/v2/e2e-test-image-append-vvcfr/test/blobs/uploads/5b6c7125-b1a6-4054-8757-bfb769013ce6?_state=8dVf5ydO_2F_Tz_oTw8iYngy2Bsl8F-wF3QaEJF7IVF7Ik5hbWUiOiJlMmUtdGVzdC1pbWFnZS1hcHBlbmQtdnZjZnIvdGVzdCIsIlVVSUQiOiI1YjZjNzEyNS1iMWE2LTQwNTQtODc1Ny1iZmI3NjkwMTNjZTYiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjEtMDEtMzFUMDM6MjI6NTcuOTUyMTIwMDA1WiJ9": unknown blob

https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.7/1355707047068307456

In the registry logs we see

time="2021-01-31T03:22:58.420412156Z" level=error msg="Error statting blob sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b in remote repository \"quay.io/openshift-release-dev/ocp-v4.0-art-dev\": Head \"https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/blobs/sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b\": error parsing HTTP 429 response body: invalid character '<' looking for beginning of value: \"<html>\\r\\n<head><title>429 Too Many Requests</title></head>\\r\\n<body bgcolor=\\\"white\\\">\\r\\n<center><h1>429 Too Many Requests</h1></center>\\r\\n<hr><center>nginx/1.12.1</center>\\r\\n</body>\\r\\n</html>\\r\\n\"" go.version=go1.15.5 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=e6614325-c675-4bda-897b-9723aaa3b1de http.request.method=GET http.request.remoteaddr="10.128.2.35:36468" http.request.uri="/v2/openshift/tools/blobs/sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" http.request.useragent=Go-http-client/2.0 openshift.auth.user="system:serviceaccount:e2e-test-image-append-vvcfr:builder" vars.digest="sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" vars.name=openshift/tools

which then becomes a 500

time="2021-01-31T03:22:58.420594499Z" level=error msg="response completed with error" err.code="blob unknown" err.detail="sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" err.message="blob unknown to registry" go.version=go1.15.5 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=e6614325-c675-4bda-897b-9723aaa3b1de http.request.method=GET http.request.remoteaddr="10.128.2.35:36468" http.request.uri="/v2/openshift/tools/blobs/sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" http.request.useragent=Go-http-client/2.0 http.response.contenttype="application/json; charset=utf-8" http.response.duration=380.131435ms http.response.status=404 http.response.written=157 openshift.auth.user="system:serviceaccount:e2e-test-image-append-vvcfr:builder" vars.digest="sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" vars.name=openshift/tools

Our clients already support 429, so when an upstream tells us to slow down we should pass that back to the client so they can see the backpressure.  Then we need to verify that this actuall resolves the issue (i.e. if the 429 continues we may want to flag backpressure on a per upstream basis from the registry so that all clients slow down).

Comment 3 Oleg Bulatov 2021-03-08 15:10:27 UTC
*** Bug 1932643 has been marked as a duplicate of this bug. ***

Comment 8 Oleg Bulatov 2021-06-11 16:02:13 UTC
*** Bug 1929767 has been marked as a duplicate of this bug. ***

Comment 11 XiuJuan Wang 2021-08-06 09:01:15 UTC
Hi Ricardo,
Do you have idea how to reproduce this bug?

Comment 13 Oleg Bulatov 2021-12-14 20:03:59 UTC
Increasing severity and priority as this problem complicates investigation of CI problems. For example, BZ 2026104.

Comment 19 Flavian 2022-05-04 13:57:42 UTC
I have tested https://github.com/openshift/image-registry/pull/273 and although it seems to address the original error, now I'm getting a different.
Looks like there are other places we need to handle the 429 from the origin registry before it can make its way to the client. I'm investigating.

Comment 26 errata-xmlrpc 2022-08-10 10:35:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 27 Red Hat Bugzilla 2023-09-15 01:32:27 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.