Bug 1923536

Summary: Image pullthrough does not pass 429 errors back to capable clients
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: Image RegistryAssignee: Flavian <fmissi>
Status: CLOSED ERRATA QA Contact: XiuJuan Wang <xiuwang>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: fmissi, fpaoline, obulatov, rmarasch, wewang, xiuwang
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the registry interpreted 429 from upstream registries as if the data is not available. Consequence: temporary errors are hidden by the registry, instead of 429 Too Many Requests clients get 404 Not Found. Fix: proxy 429 errors to clients. Result: capable clients can successfully retry and eventually pull images if upstream registry returned 429.
Story Points: ---
Clone Of: Environment:
[sig-imageregistry][Feature:ImageAppend] Image append should create images by appending them
Last Closed: 2022-08-10 10:35:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2085414    

Description Clayton Coleman 2021-02-01 17:08:18 UTC
The image append test fails infrequently (1/30) with 

error: uploading the source layer sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b failed: Patch "https://image-registry.openshift-image-registry.svc:5000/v2/e2e-test-image-append-vvcfr/test/blobs/uploads/5b6c7125-b1a6-4054-8757-bfb769013ce6?_state=8dVf5ydO_2F_Tz_oTw8iYngy2Bsl8F-wF3QaEJF7IVF7Ik5hbWUiOiJlMmUtdGVzdC1pbWFnZS1hcHBlbmQtdnZjZnIvdGVzdCIsIlVVSUQiOiI1YjZjNzEyNS1iMWE2LTQwNTQtODc1Ny1iZmI3NjkwMTNjZTYiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjEtMDEtMzFUMDM6MjI6NTcuOTUyMTIwMDA1WiJ9": unknown blob

https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.7/1355707047068307456

In the registry logs we see

time="2021-01-31T03:22:58.420412156Z" level=error msg="Error statting blob sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b in remote repository \"quay.io/openshift-release-dev/ocp-v4.0-art-dev\": Head \"https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/blobs/sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b\": error parsing HTTP 429 response body: invalid character '<' looking for beginning of value: \"<html>\\r\\n<head><title>429 Too Many Requests</title></head>\\r\\n<body bgcolor=\\\"white\\\">\\r\\n<center><h1>429 Too Many Requests</h1></center>\\r\\n<hr><center>nginx/1.12.1</center>\\r\\n</body>\\r\\n</html>\\r\\n\"" go.version=go1.15.5 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=e6614325-c675-4bda-897b-9723aaa3b1de http.request.method=GET http.request.remoteaddr="10.128.2.35:36468" http.request.uri="/v2/openshift/tools/blobs/sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" http.request.useragent=Go-http-client/2.0 openshift.auth.user="system:serviceaccount:e2e-test-image-append-vvcfr:builder" vars.digest="sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" vars.name=openshift/tools

which then becomes a 500

time="2021-01-31T03:22:58.420594499Z" level=error msg="response completed with error" err.code="blob unknown" err.detail="sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" err.message="blob unknown to registry" go.version=go1.15.5 http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=e6614325-c675-4bda-897b-9723aaa3b1de http.request.method=GET http.request.remoteaddr="10.128.2.35:36468" http.request.uri="/v2/openshift/tools/blobs/sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" http.request.useragent=Go-http-client/2.0 http.response.contenttype="application/json; charset=utf-8" http.response.duration=380.131435ms http.response.status=404 http.response.written=157 openshift.auth.user="system:serviceaccount:e2e-test-image-append-vvcfr:builder" vars.digest="sha256:cca21acb641a96561e0cf9a0c1c7b7ffbaaefc92185bd8a9440f6049c838e33b" vars.name=openshift/tools

Our clients already support 429, so when an upstream tells us to slow down we should pass that back to the client so they can see the backpressure.  Then we need to verify that this actuall resolves the issue (i.e. if the 429 continues we may want to flag backpressure on a per upstream basis from the registry so that all clients slow down).

Comment 3 Oleg Bulatov 2021-03-08 15:10:27 UTC
*** Bug 1932643 has been marked as a duplicate of this bug. ***

Comment 8 Oleg Bulatov 2021-06-11 16:02:13 UTC
*** Bug 1929767 has been marked as a duplicate of this bug. ***

Comment 11 XiuJuan Wang 2021-08-06 09:01:15 UTC
Hi Ricardo,
Do you have idea how to reproduce this bug?

Comment 13 Oleg Bulatov 2021-12-14 20:03:59 UTC
Increasing severity and priority as this problem complicates investigation of CI problems. For example, BZ 2026104.

Comment 19 Flavian 2022-05-04 13:57:42 UTC
I have tested https://github.com/openshift/image-registry/pull/273 and although it seems to address the original error, now I'm getting a different.
Looks like there are other places we need to handle the 429 from the origin registry before it can make its way to the client. I'm investigating.

Comment 26 errata-xmlrpc 2022-08-10 10:35:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 27 Red Hat Bugzilla 2023-09-15 01:32:27 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days