Description of problem: unstable build with the buildah error: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ci-tools/1772/pull-ci-openshift-ci-tools-master-images/1369684570110169088 2021/03/10 16:30:58 Build private-org-peribolos-sync failed, printing logs: Caching blobs under "/var/cache/blobs". Getting image source signatures Copying blob sha256:af875c55da53e4fe440c266863aa55b31df88175d5b2eb9b3872bf796e99887c Copying blob sha256:2d473b07cdd5f0912cd6f1a703352c82b512407db6b05b43f2553732b55df3bc Copying blob sha256:67a658d3535469bd82dfdda6899da83dc51bfa0d11f63aa4cb014ddd280ae1ae Copying blob sha256:73e99c44efe07b295629455f26c365cb00ff358b480af2cf1bc6bc428d94dabe Copying blob sha256:92aabcd08403dce8bf1631319292135665aa22825f5398f2d9e36e67fa44c84c Copying blob sha256:72f2c078316a3eab7bbaa6b6053d0e24d3c657caaabfc7b18fc19e27a18c461c error: error creating buildah builder: Error writing blob: error storing blob to file "/var/tmp/storage478406705/1": error happened during read: read tcp 10.130.5.94:54490->172.217.204.128:443: read: connection reset by peer Version-Release number of selected component (if applicable): oc --context build02 get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0 True False 5d3h Cluster version is 4.7.0 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Nalin has pinpointed the cause of error is lacking retries connecting registry. https://github.com/openshift/builder/pull/222 We created this bug for convenient backports.
Gabe, I could reproduce this bug on 4.8.0-0.nightly-2021-03-17-014745 cluster with senario: step 1: Specify a private image as source image and add pull secret. source: git: uri: https://github.com/openshift/ruby-hello-world.git images: - from: kind: DockerImage name: 172.30.128.188:5000/busybox paths: - destinationDir: openshiftqedir sourcePath: /opt/app-root pullSecret: name: test type: Git strategy: sourceStrategy: env: - name: EXAMPLE value: sample-app from: kind: ImageStreamTag name: ruby:2.7 namespace: openshift type: Source step2: Trigger build. the build failed for pulling private source image without retry. $ oc logs -f build/ruby-sample-build-1 Cloning "https://github.com/openshift/ruby-hello-world.git" ... Commit: f476e11e538445e76470b0c63252b49e294a51d2 (Merge pull request #121 from vrutkovs/ruby-2.7) Author: Ben Parees <bparees.github.com> Date: Wed Mar 10 09:52:09 2021 -0500 Caching blobs under "/var/cache/blobs". error: error creating buildah builder: Error initializing source docker://172.30.128.188:5000/busybox:latest: error pinging docker registry 172.30.128.188:5000: Get "https://172.30.128.188:5000/v2/": http: server gave HTTP response to HTTPS client Successfully senario: Trigger build with pull private image, set invaild secret at first, then correct secret quickly. source: git: uri: http://github.com/openshift/rails-ex.git type: Git strategy: sourceStrategy: from: kind: ImageStreamTag name: mystream:latest namespace: rhf34 pullSecret: name: test $ oc logs -f build/rails-ex-8 Cloning "http://github.com/openshift/rails-ex.git" ... Commit: 9e6fe17f934b87b9a399e2623d6c7dfcebd4b530 (Merge pull request #130 from pvalena/bundler) Author: Pavel Valena <pvalena> Date: Wed Sep 16 16:23:12 2020 +0200 Caching blobs under "/var/cache/blobs". error trying to parse file /var/run/secrets/openshift.io/pull/.dockerconfigjson: illegal base64 data at input byte 28 Warning: Pull failed, retrying in 5s ... Getting image source signatures Copying blob sha256:0669b0daf1fba90642d105f3bc2c94365c5282155a33cc65ac946347a90d90d1 Copying config sha256:83aa35aa1c79e4b6957e018da6e322bfca92bf3b4696a211b42502543c242d6f Writing manifest to image destination Storing signatures Generating dockerfile with builder image 172.30.128.188:5000/busybox@sha256:afe605d272837ce1732f390966166c2afff5391208ddd57de10942748694049d
Hey XiuJuan So I dove into error: error creating buildah builder: Error initializing source docker://172.30.128.188:5000/busybox:latest: error pinging docker registry 172.30.128.188:5000: Get "https://172.30.128.188:5000/v2/": http: server gave HTTP response to HTTPS client the top level error there corresponds to https://github.com/openshift/builder/blob/c910b5cd6c0e0a284c544d3fd98d1ddf8167cbc7/pkg/build/builder/source.go#L451-L458 which is where Nalin added retry. If you then work off the "Error intializing source", you get into the retry copy logic of containers image. The thing is, that logic does not retry on just any error. It distinguishes intermittent errors from ones that will persist. For reference: https://github.com/openshift/builder/blob/f9787dc13c7cff8ccbb6dd5d93a9bfddc2412ed0/vendor/github.com/containers/common/pkg/retry/retry.go#L45-L95 A server giving a "HTTP response to HTTPS client" is one of those persistent or perm fail errors. So the lack of retry there is good / expected. Based on that, and the retry you were able to identify, I'm marking this verified. thanks
Supporting information for release notes: Cause: intermittent communication issues can occur when interacting with image registries Consequence: certain interactions between openshift builds and image registries, for example when pulling images as source, could result in build failure when those intermittent issues occurred Fix: retry for pulling images for all permutations of interaction between openshift builds and image registries was added Result: openshift builds are now more resilient when they encounter intermittent communication issues with image registries
I suggest changing "source images" to "base images" in the doc text, since we're talking about what's usually called the base image in a Dockerfile, that's what we use it for in "Docker" strategy builds, and it's how we use the s2i builder image in "Source" strategy builds.
Thanks, Nalin. Updated.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438