Bug 1753731
Summary: | Builds can use incorrect location for pullthrough imagestream tags | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Adam Kaplan <adam.kaplan> |
Component: | Build | Assignee: | Gabe Montero <gmontero> |
Status: | CLOSED ERRATA | QA Contact: | wewang <wewang> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.0 | CC: | aos-bugs, bparees, gmontero, wzheng |
Target Milestone: | --- | ||
Target Release: | 4.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Builds started very soon an imagestream getting created may not leverage local pullthrough imagestream tags when specified.
Consequence: The build would attempt to pull the image from the external image registry, and if the build was not set up with the authorization and certificates needed for that registry (assuming it would pull the image from the internal openshift registry), the build would fail.
Fix: The build controller was updated to detect when its imagestream cache was missing the necessary information to allow for local pullthrough imagestream tags and retrieve that information from other means.
Result: Builds expecting to leverage local imagestream tag pullthrough would now be able to do so.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-01-23 11:06:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Adam Kaplan
2019-09-19 16:28:22 UTC
Suspected root cause is that the openshift controller manager is caching the imagestream at a point where the openshift apiserver is not using the internal registry's hostname. The hypothesis is that the flaking test runs in a window where the apiserver is using the internal registry hostname, but OCM hasn't re-listed its caches. The simplest way to clear the cache is to restart the openshift controller manager. A cluster admin can do this as follows: ``` $ oc delete pods -l app=openshift-controller-manager,controller-manager=true -n openshift-controller-manager ``` Flaking tests have been disabled in https://github.com/openshift/origin/pull/23832. These tests need to be re-enabled for this BZ to be accepted as fixed. my current thought on how to fix this is to revise the helper logic used in OCM that "resolves" imagestreamtags and rather than having that logic look for the (decorator-added) local registry field on the imagestream, just have it be aware of what the local registry hostname is(if one exists) and use it. The OCM should already know this value. Then we are not dependent on the apiserver having picked up the internal registry hostname. Some relevant code links https://github.com/openshift/openshift-controller-manager/blob/master/pkg/build/controller/build/build_controller.go#L969 https://github.com/openshift/library-go/blob/master/pkg/image/imageutil/helpers.go#L300-L355 in particular https://github.com/openshift/library-go/blob/master/pkg/image/imageutil/helpers.go#L332 In addition the `stream.Status.DockerImageRegistry`, we can use https://github.com/openshift/api/blob/master/openshiftcontrolplane/v1/types.go#L239 https://github.com/openshift/api/blob/master/openshiftcontrolplane/v1/types.go#L181 which the build controller already has https://github.com/openshift/openshift-controller-manager/blob/master/pkg/cmd/controller/build.go#L77 and it is passed into the strategy specific create build pod: https://github.com/openshift/openshift-controller-manager/blob/master/pkg/build/controller/build/build_controller.go#L730 For QE: the theory for the code change the associated extended test exposed centered around executing a build whose input image leveraged local reference policy on an imagestream so we got pull through from the internal registry. If that build ran as soon as the server came up, or as soon as a config change resulted in a restart of the openshift api server, their could be a timing window where the api server does not get the internal registry hostname before the build controller does. In such a case the stream.Status.DockerImageRepository field would be empty. If you want to spend some time trying to force such a timing window, OK, but otherwise I'm good with either moving to Verify, because the extended test is passing, or performing some basic regression testing of builds using local reference imagestream inputs. e2e passed: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/23957/pull-ci-openshift-origin-master-e2e-aws-builds/3182, and in tested with ruby which referencePolicy is local [wewang@Desktop Downloads]$ oc get is ruby -n openshift -o yaml apiVersion: image.openshift.io/v1 kind: ImageStream metadata: annotations: openshift.io/display-name: Ruby openshift.io/image.dockerRepositoryCheck: "2019-10-15T00:46:44Z" samples.operator.openshift.io/version: 4.3.0-0.ci-2019-10-14-215116 creationTimestamp: "2019-10-15T00:44:55Z" generation: 2 labels: samples.operator.openshift.io/managed: "true" name: ruby namespace: openshift resourceVersion: "15013" selfLink: /apis/image.openshift.io/v1/namespaces/openshift/imagestreams/ruby uid: 80bf634e-8121-4768-a599-da026625fbf0 spec: lookupPolicy: local: false tags: - annotations: description: Build and run Ruby 2.3 applications on RHEL 7. For more information about using this builder image, including OpenShift considerations, see https://github.com/sclorg/s2i-ruby-container/blob/master/2.3/README.md. iconClass: icon-ruby openshift.io/display-name: Ruby 2.3 openshift.io/provider-display-name: Red Hat, Inc. sampleRepo: https://github.com/sclorg/ruby-ex.git supports: ruby:2.3,ruby tags: hidden,builder,ruby version: "2.3" from: kind: DockerImage name: registry.redhat.io/rhscl/ruby-23-rhel7:latest generation: 2 importPolicy: {} name: "2.3" referencePolicy: type: Local Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |