Description of problem:
If a build uses a pull-through imagestream tag as its FROM image, sometimes the build will not use the internal registry's reference location and instead use the "source" registry's location.
This can result in build failures if the source registry requires a pull secret, such as registry.redhat.io.
Version-Release number of selected component (if applicable): 4.1.0
How reproducible: Sometimes
Steps to Reproduce:
Observed in flakes of the following test suites:
"[Feature:Builds][Conformance] oc new-app should succeed with a --name of 58 characters [Suite:openshift/conformance/parallel/minimal]"
Build uses the source registry's reference location (registry.redhat.io), resulting in failures because the build does not have a valid pull secret for that registry.
Builds succeed because they use the internal registry to pull images.
Suspected root cause is that the openshift controller manager is caching the imagestream at a point where the openshift apiserver is not using the internal registry's hostname.
The hypothesis is that the flaking test runs in a window where the apiserver is using the internal registry hostname, but OCM hasn't re-listed its caches.
The simplest way to clear the cache is to restart the openshift controller manager. A cluster admin can do this as follows:
$ oc delete pods -l app=openshift-controller-manager,controller-manager=true -n openshift-controller-manager
Flaking tests have been disabled in https://github.com/openshift/origin/pull/23832. These tests need to be re-enabled for this BZ to be accepted as fixed.
my current thought on how to fix this is to revise the helper logic used in OCM that "resolves" imagestreamtags and rather than having that logic look for the (decorator-added) local registry field on the imagestream, just have it be aware of what the local registry hostname is(if one exists) and use it. The OCM should already know this value.
Then we are not dependent on the apiserver having picked up the internal registry hostname.
Some relevant code links
in particular https://github.com/openshift/library-go/blob/master/pkg/image/imageutil/helpers.go#L332
In addition the `stream.Status.DockerImageRegistry`, we can use
which the build controller already has
and it is passed into the strategy specific create build pod:
For QE: the theory for the code change the associated extended test exposed centered around executing a build whose input image leveraged local reference policy on an imagestream so we got pull through from the internal registry.
If that build ran as soon as the server came up, or as soon as a config change resulted in a restart of the openshift api server, their could be a timing window where the api server does not get the internal registry hostname
before the build controller does.
In such a case the stream.Status.DockerImageRepository field would be empty.
If you want to spend some time trying to force such a timing window, OK, but otherwise I'm good with either moving to Verify, because the extended test is passing, or performing some basic regression testing of builds
using local reference imagestream inputs.
e2e passed: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/23957/pull-ci-openshift-origin-master-e2e-aws-builds/3182, and in tested with ruby which referencePolicy is local
[wewang@Desktop Downloads]$ oc get is ruby -n openshift -o yaml
description: Build and run Ruby 2.3 applications on RHEL 7. For more information
about using this builder image, including OpenShift considerations, see https://github.com/sclorg/s2i-ruby-container/blob/master/2.3/README.md.
openshift.io/display-name: Ruby 2.3
openshift.io/provider-display-name: Red Hat, Inc.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.