Bug 1816578 - Builder binary may omit errors reported by a remote registry during blob uploads
Summary: Builder binary may omit errors reported by a remote registry during blob uploads
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Adam Kaplan
QA Contact: wewang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-24 10:16 UTC by Ricardo Maraschini
Modified: 2020-10-27 15:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: errors in buildah's libraries could ignore certain HTTP errors Consequence: builds could fail to push images due to temporary issues with the target registry Fix: buildah respects these errors when pushing image blobs Result: buildah will fail to push an image if the upstream registry is temporarily unavailable
Clone Of:
Environment:
Last Closed: 2020-10-27 15:56:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift builder pull 157 0 None closed Bug 1816578: upgrade buildah to v1.14.9 2021-01-19 15:07:40 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:56:56 UTC

Description Ricardo Maraschini 2020-03-24 10:16:06 UTC
Description of problem:

Under certain circumstances builder binary omits errors reported by a remote registry during blob uploads. This bug has been fixed on [1] and we need to bump dependencies on buildah/builder to bring the fix to our platform.


How reproducible:

I have been able to replicate this by stressing out the remote registry using a script.


Actual results:

When affected by this problem builder reports a message that reads "blob not found" for a blob wrongly reported as uploaded successfully. We lack the failure reported during the blob upload.


Additional info:

[1] https://github.com/containers/image/commit/20733df3d7fd03dee784207107d4efda08412b73

Comment 1 Ricardo Maraschini 2020-05-26 09:40:09 UTC
For further information please refer to my comment on: https://github.com/containers/image/commit/20733df3d7fd03dee784207107d4efda08412b73

I manage to replicate this problem by having a misbehaving load balancer between the client and the registry. If this load balancer returns something such as a 5xx status code we ignore the error and move to the next layer, causing a problem when sending the Manifest at the end as some layers may be missing.

Comment 3 Adam Kaplan 2020-06-05 18:25:01 UTC
This can be fixed by bumping the vendored version of our containers libraries to be at level with buildah v1.14.9 in openshift/builder:

containers/buildah v1.14.9
containers/common v0.8.4
containers/image/v5 v5.4.3
containers/storage v1.18.2

Comment 7 wewang 2020-06-15 09:29:25 UTC
@adam @Ricardo I am not sure what scenario should I test to check the bug? I think maybe binary test is not enough for it, thanks

Comment 8 Adam Kaplan 2020-06-15 13:34:57 UTC
@wewang per https://bugzilla.redhat.com/show_bug.cgi?id=1816578#c1 you can set up a load balancer in front of an external registry which is mis-configured to return 500 errors. I believe with this fix in place builds should fail fast.

@Ricardo can you please provide the script you used to reproduce this bug?

Comment 10 wewang 2020-06-18 10:35:44 UTC
@adam @Ricardo Maraschini  Try to test using steps as follow, not sure if my steps are correct for verify the bug, please check, thanks

Steps:
 1. Using bc with output to docker.io image

    output:
      to:
        kind: DockerImage
        name: docker.io/wewang58/ruby-hello-world:latest 
 2. Start a build and check log
[root@wangwen work]# oc get builds
NAME                 TYPE     FROM          STATUS                               STARTED          DURATION
ruby-hello-world-1   Source   Git@57073c0   Complete                             19 minutes ago   56s
ruby-hello-world-2   Source   Git@57073c0   Failed (PushImageToRegistryFailed)   18 minutes ago   1m13s
Pushing image docker.io/wewang58/ruby-hello-world:latest ...
Getting image source signatures
Successfully pushed docker.io/wewang58/ruby-hello-world:latest
Warning: Push failed, retrying in 5s ...
Getting image source signatures
Successfully pushed docker.io/wewang58/ruby-hello-world:latest
Warning: Push failed, retrying in 5s ...
Getting image source signatures
Successfully pushed docker.io/wewang58/ruby-hello-world:latest
Warning: Push failed, retrying in 5s ...
error: build error: Failed to push image: error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]docker.io/wewang58/ruby-hello-world:latest" to "docker://wewang58/ruby-hello-world:latest": Error trying to reuse blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 at destination: Error checking whether a blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407 exists in docker.io/wewang58/ruby-hello-world: errors:
denied: requested access to the resource is denied
error parsing HTTP 401 response body: unexpected end of JSON input: ""

Comment 12 wewang 2020-06-22 11:30:40 UTC
@Adam @Ricardo, finally verified the bug in version:
4.6.0-0.nightly-2020-06-20-011219

Steps:
1. Config CPU and memory(100Mi) for internal regsistry, and wait registry pod to running again
 $oc patch configs.imageregistry -p '{"spec":{"resources":{"limits":{"cpu":"100m","memory":"100Mi"}}}}' --type=merge

2. Create 150 builds and push to internal registry
```
> build.$$
(
oc new-app openshift/ruby~https://github.com/openshift/ruby-hello-world
sleep 10
oc patch bc  ruby-hello-world -p '{"spec":{"runPolicy":"Parallel"}}'
for i in {1..150}; do
  echo "Trying create build $i ..."
  oc start-build ruby-hello-world
  sleep 2
  echo
done
) 2>&1 | tee -a build.$$
```

3. Check the builds, build id 138 failed with "error copying layers and metadata" and stopped.
$ oc get builds
ruby-hello-world-102   Source   Git@57073c0   Complete                             19 minutes ago   5m34s
ruby-hello-world-103   Source   Git@57073c0   Complete                             19 minutes ago   5m48s
ruby-hello-world-138   Source   Git@57073c0   Failed (PushImageToRegistryFailed)   17 minutes ago   6m10s
ruby-hello-world-139   Source   Git@57073c0   Failed (PushImageToRegistryFailed)   17 minutes ago   7m27s
ruby-hello-world-140   Source   Git@57073c0   Failed (PushImageToRegistryFailed)   17 minutes ago   6m49s

[wewang@wangwen work]$ oc logs -f build/ruby-hello-world-138
Pushing image image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest ...
Copying blob sha256:a42f82d3826b65865f0aef5efbbb3dd606c36af54cceac6d77915b096b0816ac
Copying blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407
Copying blob sha256:82a8f4ea76cb6f833c5f179b3e6eda9f2267ed8ac7d1bf652f88ac3e9cc453d1
Copying blob sha256:bde7d1339816ab545b3a65bcc3046e3b5b6e50623794e49eafc75fb2eccf801c
Copying blob sha256:f60299098adffa86ccdf377e8722819396f2800351084cb4cc0a8636386691f8
Copying blob sha256:d48813f378f2894124c64c0d9e9ff18639b997e9719128948088c94f80c2b807
Successfully pushed image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest
Warning: Push failed, retrying in 5s ...
Getting image source signatures
Successfully pushed image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest
Warning: Push failed, retrying in 5s ...
Getting image source signatures
Copying blob sha256:a42f82d3826b65865f0aef5efbbb3dd606c36af54cceac6d77915b096b0816ac
Copying blob sha256:a3ac36470b00df382448e79f7a749aa6833e4ac9cc90e3391f778820db9fa407
Copying blob sha256:f60299098adffa86ccdf377e8722819396f2800351084cb4cc0a8636386691f8
Copying blob sha256:bde7d1339816ab545b3a65bcc3046e3b5b6e50623794e49eafc75fb2eccf801c
Copying blob sha256:d48813f378f2894124c64c0d9e9ff18639b997e9719128948088c94f80c2b807
Copying blob sha256:82a8f4ea76cb6f833c5f179b3e6eda9f2267ed8ac7d1bf652f88ac3e9cc453d1
Successfully pushed image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest
Warning: Push failed, retrying in 5s ...
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.imagestore=/var/lib/shared]image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest" to "docker://image-registry.openshift-image-registry.svc:5000/wewang2/ruby-hello-world:latest": Error writing blob: Patch https://image-registry.openshift-image-registry.svc:5000/v2/wewang2/ruby-hello-world/blobs/uploads/4ab8c909-17da-4a3c-a96e-8d1418a47d75?_state=gsP-hwXozeKUZvPXk_6XSnt8vjq1KyR6n7qa9MfAY1d7Ik5hbWUiOiJ3ZXdhbmcyL3J1YnktaGVsbG8td29ybGQiLCJVVUlEIjoiNGFiOGM5MDktMTdkYS00YTNjLWE5NmUtOGQxNDE4YTQ3ZDc1IiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIwLTA2LTIyVDExOjEzOjQwLjI4NDc2OTQ2MVoifQ%3D%3D: EOF

Comment 14 errata-xmlrpc 2020-10-27 15:56:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.