Description of problem: Under certain circumstances builder binary omits errors reported by a remote registry during blob uploads. This bug has been fixed on [1] and we need to bump dependencies on buildah/builder to bring the fix to our platform. How reproducible: I have been able to replicate this by stressing out the remote registry using a script. Actual results: When affected by this problem builder reports a message that reads "blob not found" for a blob wrongly reported as uploaded successfully. We lack the failure reported during the blob upload. Additional info: [1] https://github.com/containers/image/commit/20733df3d7fd03dee784207107d4efda08412b73
For further information please refer to my comment on: https://github.com/containers/image/commit/20733df3d7fd03dee784207107d4efda08412b73 I manage to replicate this problem by having a misbehaving load balancer between the client and the registry. If this load balancer returns something such as a 5xx status code we ignore the error and move to the next layer, causing a problem when sending the Manifest at the end as some layers may be missing.
This can be fixed by bumping the vendored version of our containers libraries to be at level with buildah v1.14.9 in openshift/builder: containers/buildah v1.14.9 containers/common v0.8.4 containers/image/v5 v5.4.3 containers/storage v1.18.2
@adam @Ricardo I am not sure what scenario should I test to check the bug? I think maybe binary test is not enough for it, thanks
@wewang per https://bugzilla.redhat.com/show_bug.cgi?id=1816578#c1 you can set up a load balancer in front of an external registry which is mis-configured to return 500 errors. I believe with this fix in place builds should fail fast. @Ricardo can you please provide the script you used to reproduce this bug?
@adam @Ricardo Maraschini Try to test using steps as follow, not sure if my steps are correct for verify the bug, please check, thanks Steps: 1. Using bc with output to docker.io image output: to: kind: DockerImage name: docker.io/wewang58/ruby-hello-world:latest 2. Start a build and check log [root@wangwen work]# oc get builds NAME TYPE FROM STATUS STARTED DURATION ruby-hello-world-1 Source Git@57073c0 Complete 19 minutes ago 56s ruby-hello-world-2 Source Git@57073c0 Failed (PushImageToRegistryFailed) 18 minutes ago 1m13s Pushing image docker.io/wewang58/ruby-hello-world:latest ... Getting image source signatures Successfully pushed docker.io/wewang58/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Getting image source signatures Successfully pushed docker.io/wewang58/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Getting image source signatures Successfully pushed docker.io/wewang58/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... error: build error: Failed to push image: error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]docker.io/wewang58/ruby-hello-world:latest" to "docker://wewang58/ruby-hello-world:latest": Error trying to reuse blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 at destination: Error checking whether a blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 exists in docker.io/wewang58/ruby-hello-world: errors: denied: requested access to the resource is denied error parsing HTTP 401 response body: unexpected end of JSON input: ""
@Adam @Ricardo, finally verified the bug in version: 4.6.0-0.nightly-2020-06-20-011219 Steps: 1. Config CPU and memory(100Mi) for internal regsistry, and wait registry pod to running again $oc patch configs.imageregistry -p '{"spec":{"resources":{"limits":{"cpu":"100m","memory":"100Mi"}}}}' --type=merge 2. Create 150 builds and push to internal registry ``` > build.$$ ( oc new-app openshift/ruby~https://github.com/openshift/ruby-hello-world sleep 10 oc patch bc ruby-hello-world -p '{"spec":{"runPolicy":"Parallel"}}' for i in {1..150}; do echo "Trying create build $i ..." oc start-build ruby-hello-world sleep 2 echo done ) 2>&1 | tee -a build.$$ ``` 3. Check the builds, build id 138 failed with "error copying layers and metadata" and stopped. $ oc get builds ruby-hello-world-102 Source Git@57073c0 Complete 19 minutes ago 5m34s ruby-hello-world-103 Source Git@57073c0 Complete 19 minutes ago 5m48s ruby-hello-world-138 Source Git@57073c0 Failed (PushImageToRegistryFailed) 17 minutes ago 6m10s ruby-hello-world-139 Source Git@57073c0 Failed (PushImageToRegistryFailed) 17 minutes ago 7m27s ruby-hello-world-140 Source Git@57073c0 Failed (PushImageToRegistryFailed) 17 minutes ago 6m49s [wewang@wangwen work]$ oc logs -f build/ruby-hello-world-138 Pushing image image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest ... Copying blob sha256:a42f82d3826b65865f0aef5efbbb3dd606c36af54cceac6d77915b096b0816ac Copying blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 Copying blob sha256:82a8f4ea76cb6f833c5f179b3e6eda9f2267ed8ac7d1bf652f88ac3e9cc453d1 Copying blob sha256:bde7d1339816ab545b3a65bcc3046e3b5b6e50623794e49eafc75fb2eccf801c Copying blob sha256:f60299098adffa86ccdf377e8722819396f2800351084cb4cc0a8636386691f8 Copying blob sha256:d48813f378f2894124c64c0d9e9ff18639b997e9719128948088c94f80c2b807 Successfully pushed image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Getting image source signatures Successfully pushed image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Getting image source signatures Copying blob sha256:a42f82d3826b65865f0aef5efbbb3dd606c36af54cceac6d77915b096b0816ac Copying blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 Copying blob sha256:f60299098adffa86ccdf377e8722819396f2800351084cb4cc0a8636386691f8 Copying blob sha256:bde7d1339816ab545b3a65bcc3046e3b5b6e50623794e49eafc75fb2eccf801c Copying blob sha256:d48813f378f2894124c64c0d9e9ff18639b997e9719128948088c94f80c2b807 Copying blob sha256:82a8f4ea76cb6f833c5f179b3e6eda9f2267ed8ac7d1bf652f88ac3e9cc453d1 Successfully pushed image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest Warning: Push failed, retrying in 5s ... Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest" to "docker://image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest": Error writing blob: Patch https://image-registry.openshift-image-registry.svc:5000/v2/wewang2/ruby-hello-world/blobs/uploads/4ab8c909-17da-4a3c-a96e-8d1418a47d75?_state=gsP-hwXozeKUZvPXk_6XSnt8vjq1KyR6n7qa9MfAY1d7Ik5hbWUiOiJ3ZXdhbmcyL3J1YnktaGVsbG8td29ybGQiLCJVVUlEIjoiNGFiOGM5MDktMTdkYS00YTNjLWE5NmUtOGQxNDE4YTQ3ZDc1IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIwLTA2LTIyVDExOjEzOjQwLjI4NDc2OTQ2MVoifQ%3D%3D: EOF
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196