Bug 1816578
Summary: | Builder binary may omit errors reported by a remote registry during blob uploads | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ricardo Maraschini <rmarasch> |
Component: | Build | Assignee: | Adam Kaplan <adam.kaplan> |
Status: | CLOSED ERRATA | QA Contact: | wewang <wewang> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.5 | CC: | adam.kaplan, aos-bugs, wzheng |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: errors in buildah's libraries could ignore certain HTTP errors
Consequence: builds could fail to push images due to temporary issues with the target registry
Fix: buildah respects these errors when pushing image blobs
Result: buildah will fail to push an image if the upstream registry is temporarily unavailable
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 15:56:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ricardo Maraschini
2020-03-24 10:16:06 UTC
For further information please refer to my comment on: https://github.com/containers/image/commit/20733df3d7fd03dee784207107d4efda08412b73 I manage to replicate this problem by having a misbehaving load balancer between the client and the registry. If this load balancer returns something such as a 5xx status code we ignore the error and move to the next layer, causing a problem when sending the Manifest at the end as some layers may be missing. This can be fixed by bumping the vendored version of our containers libraries to be at level with buildah v1.14.9 in openshift/builder: containers/buildah v1.14.9 containers/common v0.8.4 containers/image/v5 v5.4.3 containers/storage v1.18.2 @adam @Ricardo I am not sure what scenario should I test to check the bug? I think maybe binary test is not enough for it, thanks @wewang per https://bugzilla.redhat.com/show_bug.cgi?id=1816578#c1 you can set up a load balancer in front of an external registry which is mis-configured to return 500 errors. I believe with this fix in place builds should fail fast. @Ricardo can you please provide the script you used to reproduce this bug? @adam @Ricardo Maraschini Try to test using steps as follow, not sure if my steps are correct for verify the bug, please check, thanks Steps: 1. Using bc with output to docker.io image output: to: kind: DockerImage name: docker.io/wewang58/ruby-hello-world:latest 2. Start a build and check log [root@wangwen work]# oc get builds NAME TYPE FROM STATUS STARTED DURATION ruby-hello-world-1 Source Git@57073c0 Complete 19 minutes ago 56s ruby-hello-world-2 Source Git@57073c0 Failed (PushImageToRegistryFailed) 18 minutes ago 1m13s Pushing image docker.io/wewang58/ruby-hello-world:latest ... Getting image source signatures Successfully pushed docker.io/wewang58/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Getting image source signatures Successfully pushed docker.io/wewang58/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Getting image source signatures Successfully pushed docker.io/wewang58/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... error: build error: Failed to push image: error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]docker.io/wewang58/ruby-hello-world:latest" to "docker://wewang58/ruby-hello-world:latest": Error trying to reuse blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 at destination: Error checking whether a blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 exists in docker.io/wewang58/ruby-hello-world: errors: denied: requested access to the resource is denied error parsing HTTP 401 response body: unexpected end of JSON input: "" @Adam @Ricardo, finally verified the bug in version: 4.6.0-0.nightly-2020-06-20-011219 Steps: 1. Config CPU and memory(100Mi) for internal regsistry, and wait registry pod to running again $oc patch configs.imageregistry -p '{"spec":{"resources":{"limits":{"cpu":"100m","memory":"100Mi"}}}}' --type=merge 2. Create 150 builds and push to internal registry ``` > build.$$ ( oc new-app openshift/ruby~https://github.com/openshift/ruby-hello-world sleep 10 oc patch bc ruby-hello-world -p '{"spec":{"runPolicy":"Parallel"}}' for i in {1..150}; do echo "Trying create build $i ..." oc start-build ruby-hello-world sleep 2 echo done ) 2>&1 | tee -a build.$$ ``` 3. Check the builds, build id 138 failed with "error copying layers and metadata" and stopped. $ oc get builds ruby-hello-world-102 Source Git@57073c0 Complete 19 minutes ago 5m34s ruby-hello-world-103 Source Git@57073c0 Complete 19 minutes ago 5m48s ruby-hello-world-138 Source Git@57073c0 Failed (PushImageToRegistryFailed) 17 minutes ago 6m10s ruby-hello-world-139 Source Git@57073c0 Failed (PushImageToRegistryFailed) 17 minutes ago 7m27s ruby-hello-world-140 Source Git@57073c0 Failed (PushImageToRegistryFailed) 17 minutes ago 6m49s [wewang@wangwen work]$ oc logs -f build/ruby-hello-world-138 Pushing image image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest ... Copying blob sha256:a42f82d3826b65865f0aef5efbbb3dd606c36af54cceac6d77915b096b0816ac Copying blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 Copying blob sha256:82a8f4ea76cb6f833c5f179b3e6eda9f2267ed8ac7d1bf652f88ac3e9cc453d1 Copying blob sha256:bde7d1339816ab545b3a65bcc3046e3b5b6e50623794e49eafc75fb2eccf801c Copying blob sha256:f60299098adffa86ccdf377e8722819396f2800351084cb4cc0a8636386691f8 Copying blob sha256:d48813f378f2894124c64c0d9e9ff18639b997e9719128948088c94f80c2b807 Successfully pushed image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Getting image source signatures Successfully pushed image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Getting image source signatures Copying blob sha256:a42f82d3826b65865f0aef5efbbb3dd606c36af54cceac6d77915b096b0816ac Copying blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 Copying blob sha256:f60299098adffa86ccdf377e8722819396f2800351084cb4cc0a8636386691f8 Copying blob sha256:bde7d1339816ab545b3a65bcc3046e3b5b6e50623794e49eafc75fb2eccf801c Copying blob sha256:d48813f378f2894124c64c0d9e9ff18639b997e9719128948088c94f80c2b807 Copying blob sha256:82a8f4ea76cb6f833c5f179b3e6eda9f2267ed8ac7d1bf652f88ac3e9cc453d1 Successfully pushed image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest" to "docker://image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest": Error writing blob: Patch https://image-registry.openshift-image-registry.svc:5000/v2/wewang2/ruby-hello-world/blobs/uploads/4ab8c909-17da-4a3c-a96e-8d1418a47d75?_state=gsP-hwXozeKUZvPXk_6XSnt8vjq1KyR6n7qa9MfAY1d7Ik5hbWUiOiJ3ZXdhbmcyL3J1YnktaGVsbG8td29ybGQiLCJVVUlEIjoiNGFiOGM5MDktMTdkYS00YTNjLWE5NmUtOGQxNDE4YTQ3ZDc1IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIwLTA2LTIyVDExOjEzOjQwLjI4NDc2OTQ2MVoifQ%3D%3D: EOF Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |