Hide Forgot
Description of problem: Depending on the order of mirrors in a imagecontentsourcepolicy, pulling an image fails or succeeds. Version-Release number of selected component (if applicable): 4.9.0-rc.3 How reproducible: Steps to Reproduce: 1. Create cluster with following imagecontentsources in install-config ``` imageContentSources: - mirrors: - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release - quay.io/openshift-release-dev/ocp-release source: quay.io/openshift-release-dev/ocp-release - mirrors: - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-art-dev - quay.io/openshift-release-dev/ocp-v4.0-art-dev source: quay.io/openshift-release-dev/ocp-v4.0-art-dev - mirrors: - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator - quay.io/app-sre/managed-upgrade-operator source: quay.io/app-sre/managed-upgrade-operator - mirrors: - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator-registry - quay.io/app-sre/managed-upgrade-operator-registry source: quay.io/app-sre/managed-upgrade-operator-registry ``` 2. Create imagestream ``` apiVersion: image.openshift.io/v1 kind: ImageStream metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" openshift.io/image.dockerRepositoryCheck: "2021-09-28T07:52:13Z" name: cli namespace: openshift spec: lookupPolicy: local: false tags: - annotations: null from: kind: DockerImage name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eaabf06b1df3c1eb92d43d9c16681061b4bfd5c03e80a4c5aa5d4adf287811a4 generation: 2 importPolicy: scheduled: true name: latest referencePolicy: type: Source ``` 3. create pod using the openshift/cli image ``` oc create deploy --image image-registry.openshift-image-registry.svc:5000/openshift/cli:latest cli-test ``` Actual results: Pulling the image fails depending on the order of the mirrors. Expected results: Even if one of the mirrors doesn't work, the image gets pulled from the other mirror. Additional info: With the following imagecontentsource in install-config, it works: ``` imageContentSources: - mirrors: - quay.io/openshift-release-dev/ocp-release - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release source: quay.io/openshift-release-dev/ocp-release - mirrors: - quay.io/openshift-release-dev/ocp-v4.0-art-dev - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-art-dev source: quay.io/openshift-release-dev/ocp-v4.0-art-dev - mirrors: - quay.io/app-sre/managed-upgrade-operator - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator source: quay.io/app-sre/managed-upgrade-operator - mirrors: - quay.io/app-sre/managed-upgrade-operator-registry - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator-registry source: quay.io/app-sre/managed-upgrade-operator-registry ``` * This is observed as flaky behavior in the OSD e2e pipeline on 4.9, as the order of mirrors is shuffled: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/osde2e-stage-aws-e2e-next-y * At this point we're not sure why pulling from pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release fails, and it seems to work sometimes. However, even if it fails, registry should fall back to the second mirror.
Can you attach must-gather or at least YAML for the image stream openshift/cli? If the image stream is successfully imported, then I also need registry logs.
Example of a failed job with must-gather: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/osde2e-stage-aws-e2e-next-y/1443063209534689280
This BZ will be a blocker for 4.10 and the fix should be backported to 4.9.z and 4.8.z. An initial fix for librar-go's client is proposed at https://github.com/openshift/library-go/pull/1226.
Verified this image on 4.10.0-0.nightly-2021-12-01-210213 spec: repositoryDigestMirrors: - mirrors: - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release - pull.q1w3.quay.rhcloud.com/openshift-release-dev/ocp-release - pull.q1w4.quay.rhcloud.com/openshift-release-dev/ocp-release - quay.io/openshift-release-dev/ocp-release source: quay.io/openshift-release-dev/ocp-release - mirrors: - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-art-dev - pull.q1w3.quay.rhcloud.com/openshift-release-dev/ocp-release - pull.q1w4.quay.rhcloud.com/openshift-release-dev/ocp-release - quay.io/openshift-release-dev/ocp-v4.0-art-dev source: quay.io/openshift-release-dev/ocp-v4.0-art-dev $oc create deploy --image image-registry.openshift-image-registry.svc:5000/openshift/cli:latest cli-test deployment.apps/cli-test created Normal Pulled 24s kubelet Successfully pulled image "image-registry.openshift-image-registry.svc:5000/openshift/cli:latest" in 52.76675ms
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056